A statistic library for Arduino.

Last Modified: December 19, 2013, at 02:37 PM
By: robtillaart
Platforms: All

remarks & comments

Intro

One of the main applications for the Arduino board is reading and logging of sensor data. For instance one monitors the temperature and airpressure every minute of the day. As that implies a lot of records, we often want the average and standard deviation to get an image of the variations of the temperature of that day.

Statistic library

The Statistic library just calculates the average and stdev of a set of data(floats). Futhermore it holds the minimum and maximum values entered. The interface consists of nine functions:

	Statistic();		// constructor
	void clear();		// reset all counters
	void add(float);	// add a new value
	long count();		// # values added
	float sum();		// total
	float minimum();	// minimum
	float maximum();	// maximum
	float average();	// average
	float pop_stdev();	// population std deviation
	float unbiased_stdev();	// unbiased std deviation 

Internally the library does not record the individual values, only the count, sum and the squared sum (sum*sum), minimum and maximum. These five are enough to calculate the average and stdev. The nice part is that it does not matter if one adds 10, 100 or 1000 values.

Usage

A small sketch shows how it can be used. A random generator is used to mimic a sensor.

#include "Statistic.h"  // without trailing s

Statistic myStats; 

void setup(void) 
{
  Serial.begin(9600);
  Serial.print("Demo Statistic lib ");
  Serial.println(STATISTIC_LIB_VERSION);
  myStats.clear(); //explicitly start clean
}

void loop(void) 
{
  long rn = random(0, 100);
  myStats.add(rn/100.0);

  Serial.print("  Count: ");
  Serial.print(myStats.count()); 

  Serial.print("  Average: ");
  Serial.print(myStats.average(), 4);

  Serial.print("  Std deviation: ");

  Serial.print(myStats.pop_stdev(), 4);
  Serial.println();

  if (myStats.count() == 300)
  {
   myStats.clear();
   delay(1000);
  }
}

In setup() the myStats is cleared so we can start adding new data.

In loop() first a random number is generated and converted to a float to be added to myStats. Then the count, the average and std deviation sofar is printed to the serial port. One could also display it on some LCD or send over ethernet etc. When 300 items are added myStats is cleared to start over again.

Notes

In the first version I collected all the samples in an array but that resulted in quite some memory usage and the user had to know the number of samples beforehand to allocate enough room. As I found this not quite acceptable therefor I stripped the data-array from the class to make it more elementary.

To use the library, make a folder in your SKETCHBOOKPATH\libaries with the name Statistic and put the .h and .cpp there.

Todo

  • Looking at a more extended statistical lib.
  • Create a template class so it can work with other datatypes.
  • Create a zip for Google code or wherever.

Enjoy tinkering,
rob.tillaart@removethisgmail.com

Update

  • 2010-11-01 Added stddev, minimum and maximum
  • 2011-01-07 Gil Ross send me an improved version of the library that is numerically more stable. This is version 0.3. Thanx Gil,
  • 2012-05-19 Added NAN as error iso -1 which was incorrect.

Statistic.h file

#ifndef Statistic_h
#define Statistic_h
// 
//    FILE: Statistic.h
//  AUTHOR: Rob dot Tillaart at gmail dot com  
//          modified at 0.3 by Gil Ross at physics dot org
// PURPOSE: Recursive Statistical library for Arduino
// HISTORY: See Statistic.cpp
//
// Released to the public domain
//

// the standard deviation increases the lib (<100 bytes)
// it can be in/excluded by un/commenting next line
#define STAT_USE_STDEV

#include <math.h>

#define STATISTIC_LIB_VERSION "0.3.1"

class Statistic 
{
	public:
	Statistic();
	void clear();
	void add(float);
	long count();
	float sum();
	float average();
	float minimum();
	float maximum();

#ifdef STAT_USE_STDEV
	float pop_stdev();	    // population stdev
	float unbiased_stdev();
#endif

protected:
	long _cnt;
	float _store;           // store to minimise computation
	float _sum;
	float _min;
	float _max;
#ifdef STAT_USE_STDEV
	float _ssqdif;		    // sum of squares difference
#endif
};

#endif
// END OF FILE

Statistic.cpp

//
//    FILE: Statistic.cpp
//  AUTHOR: Rob dot Tillaart at gmail dot com  
//          modified at 0.3 by Gil Ross at physics dot org
// VERSION: see STATISTIC_LIB_VERSION in .h
// PURPOSE: Recursive statistical library for Arduino
//
// NOTE: 2011-01-07 Gill Ross
// Rob Tillaart's Statistic library uses one-pass of the data (allowing
// each value to be discarded), but expands the Sum of Squares Differences to
// difference the Sum of Squares and the Average Squared. This is susceptible
// to bit length precision errors with the float type (only 5 or 6 digits 
// absolute precision) so for long runs and high ratios of
// the average value to standard deviation the estimate of the 
// standard error (deviation) becomes the difference of two large
// numbers and will tend to zero.
//
// For small numbers of iterations and small Average/SE th original code is
// likely to work fine.
// It should also be recognised that for very large samples, questions 
// of stability of the sample assume greater importance than the
// correctnness of the asymptotic estimators.
//
// This recursive algorithm, which takes slightly more computation per
// iteration is numerically stable.
// It updates the number, mean, max, min and SumOfSquaresDiff each step to
// deliver max min average, population standard error (standard deviation) and 
// unbiassed SE.
// -------------
//
// HISTORY:
// 0.1 - 2010-10-29 initial version
// 0.2 - 2010-10-29 stripped to minimal functionality
// 0.2.01 - 2010-10-30
//   added minimim, maximum, unbiased stdev,
//   changed counter to long -> int overflows @32K samples
// 0.3 - branched from 0.2.01 version of Rob Tillaart's code
// Released to the public domain
// 0.3.1 - minor edits
//

#include "Statistic.h"

Statistic::Statistic()
{
	clear();
}

// resets all counters
void Statistic::clear()
{ 
	_cnt = 0;     // count at N stored, becoming N+1 at a new iteration
	_sum = 0.0;
	_min = 0.0;
	_max = 0.0;
#ifdef STAT_USE_STDEV
	_ssqdif = 0.0;  // not _ssq but sum of square differences
	                // which is SUM(from i = 1 to N) of 
                        // (f(i)-_ave_N)**2
#endif
}

// adds a new value to the data-set
void Statistic::add(float f)
{
	if (_cnt < 1)
	{
		_min = f;
		_max = f;
	} else {
	  if (f < _min) _min = f;
	  if (f > _max) _max = f;           
        } // end of if (_cnt == 0) else
        _sum += f;
	_cnt++;
#ifdef STAT_USE_STDEV 
        if (_cnt >1) {
           _store = (_sum / _cnt - f);
           _ssqdif = _ssqdif + _cnt * _store * _store / (_cnt-1);
        } // end if > 1

#endif
}

// returns the number of values added
long Statistic::count()
{
	return _cnt;
}

// returns the average of the data-set added sofar
float Statistic::average()
{
	if (_cnt < 1) return NAN; // original code returned 0
	return _sum / _cnt;
}

// returns the sum of the data-set (0 if no values added)
float Statistic::sum()
{
	return _sum;
}

// returns the sum of the data-set (0 if no values added)
float Statistic::minimum()
{
	return _min;
}

// returns the sum of the data-set (0 if no values added)
float Statistic::maximum()
{
	return _max;
}

// Population standard deviation = s = sqrt [ S ( Xi - ยต )2 / N ]
// http://www.suite101.com/content/how-is-standard-deviation-used-a99084
#ifdef STAT_USE_STDEV  
float Statistic::pop_stdev()
{
	if (_cnt < 1) return NAN; // otherwise DIV0 error
	return sqrt( _ssqdif / _cnt);
}

float Statistic::unbiased_stdev()
{
	if (_cnt < 2) return NAN; // otherwise DIV0 error
	return sqrt( _ssqdif / (_cnt - 1));
}
#endif
// END OF FILE

Share