Using this link as a guide, https://www.geeksforgeeks.org/difference-float-double-c-cpp/#:~:text=double%20is%20a%2064%20bit,15%20decimal%20digits%20of%20precision. double is a 64 bit IEEE 754 double precision Floating Point Number (1 bit for the sign, 11 bits for the exponent, and 52 bits for the value), i.e. double has 15 decimal digits of precision , the below code does not maintain 15 decimal digits of precision. Rather, 14.
It is for simple projectile motion calculator, where, the range of a projectile launched at 30 degrees should match that of the same projectile launched at 60 degrees.
#include <iostream>
#include <iomanip>
int main()
{
const double g = 9.80665;
const double pi = 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679;
double a1 = 30.0;
double a2 = 60.0;
double v = 25.0;
double vx1 = v * cos(a1 * pi/180.0);
double vy1 = v * sin(a1 * pi/180.0);
double vx2 = v * cos(a2 * pi/180.0);
double vy2 = v * sin(a2 * pi/180.0);
double t_max1 = 2 * vy1 / g;
double t_max2 = 2 * vy2 / g;
double range1 = t_max1 * vx1;
double range2 = t_max2 * vx2;
std::cout << std::setprecision(16) << range1 << ", " << range2 << std::endl;
return 0;
}
Output: 55.19375906810931, 55.19375906810933
It is not possible for any fixed-size numerical format to “maintain” a specific precision, regardless of whether it is floating-point, integer, fixed-point, or something else.
Whenever the result of an operation performed with real-number mathematics is not representable in the numerical format, only an approximation of the real-number result can be returned. The real-number result must be rounded to some representable value. This introduces a rounding error.
When there are multiple operations, the rounding errors may accumulate and compound in various ways. The interactions and consequences may be very complicated, and there is an entire field of study, numerical analysis, for it.
As a simple example, consider integer arithmetic, in which the resolution is 1. Yet, if we compute 17/3•5 with
17/3*5, we get 25, where the real-number result would be 28⅓, and the integer result nearest the ideal result would be 28. So the computed result is off by three units from the best representable result (and 3⅓ from the real-number result) even though we only did two operations. Integer arithmetic cannot “maintain” 1 unit of precision.In your sample, rounding errors occur in these operations:
doubleformat.doubleformat.a1anda2are each multiplied bypi.2 * vy1and2 * vy2are divided byg. (The multiplication by 2 does not introduce any rounding error as its result is exactly representable in binary-based floating-point.)vx1andvx2.Additionally,
sinandcosare difficult to implement, and common implementations prefer speed and provide it at the cost of allowing a little additional error. Their results could be off by a few ULP (units of least precision), possibly more in bad implementations.