Empirical probability with boost histogram

149 Views Asked by At

I have a boost::histogram with 100 bins over the range [-3.5,3.5]. I fill it with a data vector qp. Since I work with periodic BC all the values of q in qp are in [-3.5,3.5].

auto h = boost::histogram::make_histogram(boost::histogram::axis::regular<>(100, -3.5, 3.5));
for (auto&& value : qp)
    h(value.first);

For security I count all points in the bin with

int samplesize = 0;
for (auto&& it : indexed(h)) 
    samplesize += *it;

I prepare the data for the plot

for (auto&& it : indexed(h)) {
    const auto bin = it.bin(0_c);
    double xaxis = bin.lower();
    double density = *it / samplesize;
    const std::pair<double, double> paar = std::make_pair(xaxis,density);
    printToStream(file, paar);
}

The result confuses me. It should be a normalized probability distribution, but it is definitely not (the values on the y-axis are way to low)

enter image description here

Is there a boost method with which I automatically get a (normalized) relative frequency?

1

There are 1 best solutions below

0
On BEST ANSWER

You need to divide by the width of the bin. So you need to modify the density:

double density = *it / bin.width() / samplesize;

Here's the complete code in boost-histogram's Python bindings, since it was quicker to mock up:

import boost_histogram as bh
import numpy as np
import matplotlib.pyplot as plt

h = bh.Histogram(bh.axis.Regular(100, -3.5, 3.5))

data = np.concatenate([
    np.random.normal(-.75,.3, 1_000_000),
    np.random.normal(.75,.3, 750_000),
    np.random.normal(-1.5,.2, 200_000),
])

h.fill(data)

# Compute the areas of each bin (any number of axes)
areas = np.prod(h.axes.widths, axis=0)

# Compute the density (any number of axes)
density = h.view() / h.sum() / areas

plt.plot(h.axes.centers[0], density)

Python version of example

PS: You can use a circular axes if your data is circular