I am writing a few functionalities for distributions and used the normal-distribution to run the tests between my implementation and C++ Boost.
Given the Probability Density Function (pdf: http://www.mathworks.com/help/stats/normpdf.html)
Which I wrote like that:
double NormalDistribution1D::prob(double x) {
return (1 / (sigma * (std::sqrt(boost::math::constants::pi<double>()*2))))*std::exp((-1 / 2)*(((x - mu) / sigma)*((x - mu) / sigma)));
}
To compare my results the way how it is done with C++ Boost:
boost::math::normal_distribution <> d(mu, sigma);
return boost::math::pdf(d, x);
I'm not very positively surprised - my version took 44278 nano Seconds, boost only 326.
So I played a bit around and wrote the method probboost in my NormalDistribution1D-Class and compared all three:
void MATTest::runNormalDistribution1DTest1() {
double mu = 0;
double sigma = 1;
double x = 0;
std::chrono::high_resolution_clock::time_point tn_start = std::chrono::high_resolution_clock::now();
NormalDistribution1D *n = new NormalDistribution1D(mu, sigma);
double nres = n->prob(x);
std::chrono::high_resolution_clock::time_point tn_end = std::chrono::high_resolution_clock::now();
std::chrono::high_resolution_clock::time_point tdn_start = std::chrono::high_resolution_clock::now();
NormalDistribution1D *n1 = new NormalDistribution1D(mu, sigma);
double nres1 = n1->probboost(x);
std::chrono::high_resolution_clock::time_point tdn_end = std::chrono::high_resolution_clock::now();
std::chrono::high_resolution_clock::time_point td_start = std::chrono::high_resolution_clock::now();
boost::math::normal_distribution <> d(mu, sigma);
double dres = boost::math::pdf(d, x);
std::chrono::high_resolution_clock::time_point td_end = std::chrono::high_resolution_clock::now();
std::cout << "Mu : " << mu << "; Sigma: " << sigma << "; x" << x << std::endl;
if (nres == dres) {
std::cout << "Result" << nres << std::endl;
} else {
std::cout << "\033[1;31mRes incorrect: " << nres << "; Correct: " << dres << "\033[0m" << std::endl;
}
auto duration_n = std::chrono::duration_cast<std::chrono::nanoseconds>(tn_end - tn_start).count();
auto duration_d = std::chrono::duration_cast<std::chrono::nanoseconds>(td_end - td_start).count();
auto duration_dn = std::chrono::duration_cast<std::chrono::nanoseconds>(tdn_end - tdn_start).count();
std::cout << "own boost: " << duration_dn << std::endl;
if (duration_n < duration_d) {
std::cout << "Boost: " << (duration_d) << "; own implementation: " << duration_n << std::endl;
} else {
std::cout << "\033[1;31mBoost faster: " << (duration_d) << "; than own implementation: " << duration_n << "\033[0m" << std::endl;
}
}
The results are (Was compiling and running the checking-Method 3 times)
own boost: 1082 Boost faster: 326; than own implementation: 44278
own boost: 774 Boost faster: 216; than own implementation: 34291
own boost: 769 Boost faster: 230; than own implementation: 33456
Now this puzzles me: How is it possible the method from the class takes 3 times longer than the statements which were called directly?
My compiling options:
g++ -O2 -c -g -std=c++11 -MMD -MP -MF "build/Debug/GNU-Linux-x86/main.o.d" -o build/Debug/GNU-Linux-x86/main.o main.cpp
g++ -O2 -o ***Classes***
Firs of all, you are allocating your object dynamically, with
new
:If you did just like you have done with boost, that alone would be sufficient to have the same (or comparable) speed:
Now, I don't know if the way you spelled out your expression in
NormalDistribution1D::prob()
is significant, but I am skeptical that it would make any difference to write it in a more "optimized" way, because arithmetic expressions like that is the kind of thing compilers can optimize very well. Maybe it will get faster if you user the--ffast-math
switch, that will give more optimization freedom to the compiler.Also, if the definition of
double NormalDistribution1D::prob(double x)
is in another compilation unit (another .cpp file), the compiler will be unable to inline it, and that will also make a noticeable overhead (maybe twice slower or less). In boost, almost everithing is implemented inside headers, so inlines will always happen when the compiler seems fit. You can overcome this problem if you compile and link with gcc's-flto
switch.