A histogram library for Arduino.

Last Modified: June 08, 2014, at 02:14 PM
By: robtillaart
Platforms: All

remarks & comments

Intro

One of the main applications for the Arduino board is reading and logging of sensor data. We often want to make a histogram of this data to get insight of the distribution of the measurements. This is where this library comes in.

If you need more quantitative analysis, you might need the statistic library

Histogram library

The Histogram distributes the values added into buckets and keeps count. The interface consists of:

Histogram(uint8_t len, double *bounds);  // constructor
~Histogram();  			// destructor
void clear();			// reset all counters
void add(double val);		// add value, increase count
void sub(double val);		// 'add' value, decrease count
uint8_t size();			// number of buckets
unsigned long count();		// number of values added
long bucket(uint8_t idx);	// count of bucket
double frequency(uint8_t idx);	// the relative frequency of a bucket
uint8_t find(double f);		// find the bucket for value f

// experimental
double PMF(double val);		// Probability Mass Function
double CDF(double val);		// Cumulative Distribution Function
double VAL(double prob);	// 

When the class is initialized an array of the boundaries to define the borders of the buckets is passed to the constructor. This array should be declared global as the histogram class does not copy the values to keep memory usage low. This allows to change the boundaries runtime, so after a clear(), a new histogram can be created.

Internally the library does not record the individual values, only the count per bucket. If a new value is added - add() - the class checks in which bucket it belongs and the buckets counter is increased.

The sub() function is used to decrease the count of a bucket and it can cause the count to become below zero. ALthough seldom used but still depending on the application it can be useful. E.g. when you want to compare two value generating streams, you let one stream add() and the other sub(). If the histogram is similar they should cancel each other out (more or less), and the count of all the buckets should be around 0. [not tried]. Note: 2 bucket array's one for add() one for sub() ???

Frequency() may be removed to reduce footprint as it can be calculated quite easily with the formula (1.0* bucket(i))/count().

There are three experimental functions: PMF, CDF and VAL. PMF is quite similar to frequency, but uses a value as parameter. CDF gives the sum of frequencies <= value. VAL is CDF inverted. As the Arduino typical uses a small number of buckets these functions are quite coarse/inaccurate (linear interpolation within bucket is to be investigated)

Usage

A small sketch shows how the histogram class can be used.

//
//    FILE: hist_test.pde
//  AUTHOR: Rob Tillaart
//    DATE: 2012-12-23
//
// PUPROSE: test histogram frequency
//

#include "histogram.h"

double b[] = { 0, 100, 200, 300, 325, 350, 375, 400, 500, 600, 700, 800, 900, 1000 };

Histogram hist(14, b);

unsigned long lastTime = 0;
const unsigned long threshold = 500;  // milliseconds, for updating display

void setup()
{
  Serial.begin(115200);
  Serial.print("\nHistogram version: ");
  Serial.println(HISTOGRAM_LIB_VERSION);
}

void loop()
{
  int x = analogRead(A0);
  hist.add(x);

  // update output 
  unsigned long now = millis();
  if (now - lastTime > threshold)
  {
    lastTime = now;
    Serial.print(hist.count());
    for (int i = 0; i < hist.size(); i++)
    {
      Serial.print("\t");
      // gives percentage per bucket
      // Serial.print(hist.bucket(i));   
      Serial.print(hist.frequency(i),2);  
    }
	// find quartiles
    Serial.print("\t");
    Serial.print(hist.VAL(0.25), 2);
    Serial.print("\t");
    Serial.print(hist.VAL(0.50), 2);
    Serial.print("\t");
    Serial.print(hist.VAL(0.75), 2);
    Serial.println();

    if (hist.count() > 250000UL) hist.clear();
  }
}

Notes

To use the library, make a folder in your SKETCHBOOKPATH\libaries with the name Histogram and put the .h and .cpp there.

Todo

  • Copy the boundaries array?
  • Always Refactor
  • Additional functions.
    • Average value per bucket?
    • separate bucket-array for sub()
    • strategy() for how to find the right bucket..
    • investigate linear interpolation for PMF, CDF and VAL functions to improve accuracy.

Enjoy tinkering,

rob.tillaart@removethisgmail.com

History

  • 0.1.0 - 2012-11-10 initial version
  • 0.1.1 - 2012-11-10 added PMF(), CDF(), VAL()
  • 0.1.2 - 2012-12-23 float -> double; small fixes

Histogram.h file

#ifndef Histogram_h
#define Histogram_h
// 
//    FILE: Histogram.h
//  AUTHOR: Rob dot Tillaart at gmail dot com  
// PURPOSE: Histogram library for Arduino
// HISTORY: See Histogram.cpp
//
// Released to the public domain
//

#include <stdlib.h>
#include <math.h>
#include <inttypes.h>

#define HISTOGRAM_LIB_VERSION "0.1.2"

class Histogram 
{
	public:
	Histogram(uint8_t len, double *bounds);
	~Histogram();
	void clear();
	void add(double val);
	void sub(double val);
	uint8_t size();
	unsigned long count();
	long bucket(uint8_t idx);
	double frequency(uint8_t idx);
	double PMF(double val);
	double CDF(double val);
	double VAL(double prob);
	uint8_t find(double f);

protected:
	double * _bounds;
	long * _data;
	uint8_t _len;
	unsigned long _cnt;
};

#endif
// END OF FILE

Histogram.cpp

//
//    FILE: histogram.cpp
//  AUTHOR: Rob dot Tillaart at gmail dot com  
// VERSION: see HISTOGRAM_LIB_VERSION in .h
// PURPOSE: Histogram library for Arduino
//
// HISTORY:
// 0.1.0 - 2012-11-10 initial version
// 0.1.1 - 2012-11-10 added PMF() and CDF()
// 0.1.2 - 2012-12-23 changed float to double; some comments
//
// Released to the public domain
//

#include "histogram.h"

Histogram::Histogram(uint8_t len, double *bounds)
{
	_bounds = bounds;
	_len = len;
	_data = (long*) malloc((len+1) * sizeof(long)); 
	clear();
}

Histogram::~Histogram()
{
	free(_data);  // free may still has a bug :(
}

// resets all counters
void Histogram::clear()
{ 
	for (uint8_t i = 0; i < _len+1; i++)
	{
		_data[i] = 0;
	}
	_cnt = 0;
}

// adds a new value to the histogram - increasing
void Histogram::add(double f)
{
	_data[find(f)]++;
	_cnt++;
}

// adds a new value to the histogram - decreasing
void Histogram::sub(double f)
{
	_data[find(f)]--;
	_cnt++;;
}

// returns the number of buckets
uint8_t Histogram::size()
{
	return _len+1;
}

// returns the number of values added
unsigned long Histogram::count()
{
	return _cnt;
}

// returns the count of a bucket
long Histogram::bucket(uint8_t idx)
{
	if (idx > _len+1) return 0;
	return _data[idx];
}

// returns the relative frequency of a bucket
double Histogram::frequency(uint8_t idx)
{
	if (_cnt == 0) return NAN;
	if (idx > _len+1) return 0;
	return (1.0 * _data[idx]) / _cnt;
}

// EXPERIMENTAL
// returns the probability of the bucket of a value
double Histogram::PMF(double val)
{
	if (_cnt == 0) return NAN;
	uint8_t idx = find(val);
	return (1.0 *_data[idx]) / _cnt;
}

// EXPERIMENTAL
// returns the cummulative probability of 
// values <= value
double Histogram::CDF(double val)
{
	if (_cnt == 0) return NAN;
	uint8_t idx = find(val);
	long sum = 0;
	for (uint8_t i=0; i<= idx; i++) sum += _data[i];
	return (1.0 * sum) / _cnt;
}

// EXPERIMENTAL
// returns the value of the original array for 
// which the CDF is at least prob.
double Histogram::VAL(double prob)
{
	if (_cnt == 0) return NAN;
	if (prob < 0.0) prob = 0.0;
	if (prob > 1.0) prob = 1.0;

	double value = prob * _cnt;
	long sum = 0;
	for (uint8_t i = 0; i <= _len; i++)
	{
		sum += _data[i];
		if (sum >= value) return _bounds[i];
	}
	return INFINITY;
}

// returns the bucket number for value f
uint8_t Histogram::find(double f)
{
	uint8_t i = 0;
	while(i < _len && f > _bounds[i]) i++;
	return i;
}
// END OF FILE

Share