I am looking for a efficient way to properly do mathematical operations with floating values. As I am in the embedded C, I don't want to use any extra library for float data type.
As far as I understand, the correct way here would be to treat a floating value as a raw binary(sign, exponent, mantissa), and do the operations like that. But I cannot find any examples on how exactly that works.
I am looking for a explication on how to do the following with no float data type: Given a variable int x that can have values from 0 to 10000.
y = x * 0.720 + 84.234;
y = y / 2.5;
Thank you for your time internet
Floating point libraries are not required for the example operations you have suggested, and while avoiding floating point code on an embedded system without an FPU is often advisable, doing that by implementing your own floating point encoding will save you nothing and will likely be less efficient, less comprehensible and more error prone than using compiler's built-in FP support.
Instead, you need to avoid floating-point code entirely, and use fixed-point encoding. In many cases that can be done ad-hoc for individual expressions, but if your application is math intensive (involving trig, logs, sqrt, exponentiation for example) you might to choose a fixed-point library or implement your own.
Floating-point dependency is trivially eradicated in the examples you have suggested; for example:
or more efficiently using binary-fixed-point and a 10 bit fractional part:
Although you might consider
int64_t
for greater numeric range - in which case you might also use more fractional bits for greater precision too.If you are doing a lot of intensive fixed-point maths, you would do well to consider a library or implement one using CORDIC algorithms. An example of such a library can be found at https://www.justsoftwaresolutions.co.uk/news/optimizing-applications-with-fixed-point-arithmetic.html, although it is C++ - the clear advantage being that by defining a
fixed
class and extensive operator overloading, existing floating-point code can largely be converted to fixed point by replacingdouble
orfloat
keywords withfixed
and compiling as C++ - even if the code is otherwise non-OOP and entirely C-like.