A statistic library for Arduino.

Last Modified: March 07, 2015, at 03:00 PM
By: robtillaart
Platforms: All

remarks & comments

Latest version on - Github

Intro

One of the main applications for the Arduino board is reading and logging of sensor data. For instance one monitors the temperature and air pressure every minute of the day. As that implies a lot of records, we often want the average and standard deviation to get an image of the variations of the temperature of that day.

Background reading - tutorial statistic formulas

Statistic library

The Statistic library just calculates the average and stdev of a set of data(floats). Furthermore it holds the minimum and maximum values entered. The interface consists of nine functions: (version 0.3.3 on Github)

	Statistic();		// constructor
	void clear();		// reset all counters
	void add(double);	// add a new value
	long count();		// # values added
	double sum();		// total
	double minimum();	// minimum
	double maximum();	// maximum
	double average();	// average
	double pop_stdev();	// population std deviation
	double unbiased_stdev();	// unbiased std deviation 

Internally the library does not record the individual values, only the count, sum and the squared sum (sum*sum), minimum and maximum. These five are enough to calculate the average and stdev. The nice part is that it does not matter if one adds 10, 100 or 1000 values.

Usage

A small sketch shows how it can be used. A random generator is used to mimic a sensor.

#include "Statistic.h"  // without trailing s

Statistic myStats; 

void setup(void) 
{
  Serial.begin(9600);
  Serial.print("Demo Statistic lib ");
  Serial.println(STATISTIC_LIB_VERSION);
  myStats.clear(); //explicitly start clean
}

void loop(void) 
{
  long rn = random(0, 100);
  myStats.add(rn/100.0);

  Serial.print("  Count: ");
  Serial.print(myStats.count()); 

  Serial.print("  Average: ");
  Serial.print(myStats.average(), 4);

  Serial.print("  Std deviation: ");

  Serial.print(myStats.pop_stdev(), 4);
  Serial.println();

  if (myStats.count() == 300)
  {
   myStats.clear();
   delay(1000);
  }
}

In setup() the myStats is cleared so we can start adding new data.

In loop() first a random number is generated and converted to a float to be added to myStats. Then the count, the average and std deviation so far is printed to the serial port. One could also display it on some LCD or send over Ethernet etc. When 300 items are added myStats is cleared to start over again.

Notes

In the first version I collected all the samples in an array but that resulted in quite some memory usage and the user had to know the number of samples beforehand to allocate enough room. As I found this not quite acceptable therefore I stripped the data-array from the class to make it more elementary.

To use the library, make a folder in your SKETCHBOOKPATH\libaries with the name Statistic and put the .h and .cpp there.

Todo

  • Looking at a more extended statistical lib.

Enjoy tinkering,
rob.tillaart@removethisgmail.com

Update

  • 2010-11-01 Added stdev, minimum and maximum
  • 2011-01-07 Gil Ross send me an improved version of the library that is numerically more stable. This is version 0.3. Thanx Gil,
  • 2012-05-19 Added NAN as error instead of -1 which was incorrect.
  • 2015-03-07 - version 0.3.3 - changed float to double to support ARM proc

Statistic.h file

#ifndef Statistic_h
#define Statistic_h
//
//    FILE: Statistic.h
//  AUTHOR: Rob dot Tillaart at gmail dot com
//          modified at 0.3 by Gil Ross at physics dot org
// VERSION: 0.3.3
// PURPOSE: Recursive Statistical library for Arduino
// HISTORY: See Statistic.cpp
//
// Released to the public domain
//

// the standard deviation increases the lib (<100 bytes)
// it can be in/excluded by un/commenting next line
#define STAT_USE_STDEV

#include <math.h>

#define STATISTIC_LIB_VERSION "0.3.3"

class Statistic
{
public:
    Statistic();
    void clear();
    void add(double);

    // returns the number of values added
    unsigned long count()   { return _cnt; }; // zero if empty
    double sum()            { return _sum; }; // zero if empty
    double minimum()        { return _min; }; // zero if empty
    double maximum()        { return _max; }; // zero if empty
    double average();

#ifdef STAT_USE_STDEV
    double variance();
    double pop_stdev();	    // population stdev
    double unbiased_stdev();
#endif

protected:
    unsigned long _cnt;
    double _store;           // store to minimise computation
    double _sum;
    double _min;
    double _max;
#ifdef STAT_USE_STDEV
    double _ssqdif;		    // sum of squares difference
#endif
};

#endif
// END OF FILE

Statistic.cpp

//
//    FILE: Statistic.cpp
//  AUTHOR: Rob dot Tillaart at gmail dot com
//          modified at 0.3 by Gil Ross at physics dot org
// VERSION: 0.3.3
// PURPOSE: Recursive statistical library for Arduino
//
// NOTE: 2011-01-07 Gill Ross
// Rob Tillaart's Statistic library uses one-pass of the data (allowing
// each value to be discarded), but expands the Sum of Squares Differences to
// difference the Sum of Squares and the Average Squared. This is susceptible
// to bit length precision errors with the float type (only 5 or 6 digits
// absolute precision) so for long runs and high ratios of
// the average value to standard deviation the estimate of the
// standard error (deviation) becomes the difference of two large
// numbers and will tend to zero.
//
// For small numbers of iterations and small Average/SE th original code is
// likely to work fine.
// It should also be recognised that for very large samples, questions
// of stability of the sample assume greater importance than the
// correctness of the asymptotic estimators.
//
// This recursive algorithm, which takes slightly more computation per
// iteration is numerically stable.
// It updates the number, mean, max, min and SumOfSquaresDiff each step to
// deliver max min average, population standard error (standard deviation) and
// unbiassed SE.
// -------------
//
// HISTORY:
// 0.1 - 2010-10-29 initial version
// 0.2 - 2010-10-29 stripped to minimal functionality
// 0.2.01 - 2010-10-30
//   added minimim, maximum, unbiased stdev,
//   changed counter to long -> int overflows @32K samples
// 0.3 - 2011-01-07
//   branched from 0.2.01 version of Rob Tillaart's code
// 0.3.1 - minor edits
// 0.3.2 - 2012-11-10
//   minor edits
//   changed count -> unsigned long allows for 2^32 samples
//   added variance()
// 0.3.3 - 2015-03-07
//   float -> double to support ARM (compiles)
//   moved count() sum() min() max() to .h; for optimizing compiler
//
// Released to the public domain
//

#include "Statistic.h"

Statistic::Statistic()
{
    clear();
}

// resets all counters
void Statistic::clear()
{
    _cnt = 0;
    _sum = 0.0;
    _min = 0.0;
    _max = 0.0;
#ifdef STAT_USE_STDEV
    _ssqdif = 0.0;  // not _ssq but sum of square differences
    // which is SUM(from i = 1 to N) of
    // (f(i)-_ave_N)**2
#endif
}

// adds a new value to the data-set
void Statistic::add(double value)
{
    if (_cnt == 0)
    {
        _min = value;
        _max = value;
    } else {
        if (value < _min) _min = value;
        else if (value > _max) _max = value;
    }
    _sum += value;
    _cnt++;

#ifdef STAT_USE_STDEV
    if (_cnt > 1)
    {
        _store = (_sum / _cnt - value);
        _ssqdif = _ssqdif + _cnt * _store * _store / (_cnt-1);
    }
#endif
}

// returns the average of the data-set added sofar
double Statistic::average()
{
    if (_cnt == 0) return NAN; // original code returned 0
    return _sum / _cnt;
}

// Population standard deviation = s = sqrt [ S ( Xi - ยต )2 / N ]
// http://www.suite101.com/content/how-is-standard-deviation-used-a99084
#ifdef STAT_USE_STDEV

double Statistic::variance()
{
    if (_cnt == 0) return NAN; // otherwise DIV0 error
    return _ssqdif / _cnt;
}

double Statistic::pop_stdev()
{
    if (_cnt == 0) return NAN; // otherwise DIV0 error
    return sqrt( _ssqdif / _cnt);
}

double Statistic::unbiased_stdev()
{
    if (_cnt < 2) return NAN; // otherwise DIV0 error
    return sqrt( _ssqdif / (_cnt - 1));
}

#endif
// END OF FILE

Share