I have implemented some approximations for trigonometric functions (sin,cos,arctan) computed with single precision (32 bit floating point) in C. They are accurate to about +/- 2 ulp.
My target device does not support any <cmath> or <math.h> methods. It does not provide a FMA, but a MAC ALU. ALU and LU compute in 32 bit format.
My arctan approximation is actually a modified version of the approximation of N.juffa, which approximates arctan on the full range. Sine and cosine function are accurate up to 2 ulp within the range [-pi,pi].
I am now aiming to provide a larger input range (as large as possible, ideally [FLT_MIN,FLT_MAX]) for sine and cosine, which leads me to argument reduction.
I'm currently reading different papers like ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit by K.C.Ng or the paper about this new argument reduction algorithm, but I wasn't able to derive an implementation from it.
Also I want to mention two stackoverflow questions that refer to related problems: There is a approach with matlab and c++ which is based on the first paper I linked. It is actually using matlab, cmath methods and it limits the input to [0,20.000]. The other one was already mentioned in the comments. It is an approach to an implementation of sin and cos in C, using various c-libraries which are not available for me. Since both posts are already several years old, there might be some new findings.
It seems like the algorithm mostly used in this case is to store the number of 2/pi accurate up to the needed number of bits, to be able to compute the modulo calculation accurately and simultaneously avoid cancellation. My device does not provide a large DMEM, which means large look-up tables with hundreds of bits are not possible. This procedure is actually described on page 70 of this reference, which by the way provides a lot of useful informatin about floating point math.
So my question is: Is there another efficient way to reduce the arguments for sine and cosine obtaining single precision avoiding large LUTs? The papers mentioned above actually focus on double precision and use up to 1000 digits, which is not suitable for my usecase.
I actually haven't found any implementation in C nor an implementation aiming single precision calculation, I would be grateful for any sorts of hints /links /examples...
The following code is based on a previous answer in which I demonstrated how to perform a fairly accurate argument reduction for trigonometric functions by using the Cody-Waite method of split constants for arguments small in magnitude, and the Payne-Hanek method for arguments large in magnitude. For details on the Payne-Hanek algorithm see there, for details on the Cody-Waite algorithm see this previous answer of mine.
Here I have made adjustments necessary to adjust to the restrictions of the asker's platform, in that no 64-bit types are supported, fused multiply-add is not supported, and helper functions from
math.hare not available. I am assuming thatfloatmaps to IEEE-754binary32format, and that there is a way to re-interpret such a 32-bit float as a 32-bit unsigned integer and vice versa. I have implemented this re-interpretation via the standard portable idiom, that is, by usingmemcpy(), but other methods may be chosen appropriate for the unspecified target platform, such as inline assembly, machine-specific intrinsics, or volatile unions.Since this code is basically a port of my previous code to a more restrictive environment, it lacks perhaps the elegance of a de novo design specifically targeted at that environment. I have basically replaced the
frexp()helper function frommath.hwith some bit twiddling, emulated 64-bit integer computation with pairs of 32-bit integers, replaced the double-precision computation with 32-bit fixed-point computation (which worked much better than I had anticipated), and replaced all FMAs with the unfused equivalent.Re-working the Cody-Waite portion of the argument reduction took quite a bit of work. Clearly, without FMA available, we need to ensure a sufficient number of trailing zero bits in the constituent parts of the constant π/2 (except the least significant one) to make sure the products are exact. I spent several hours experimentally puzzling out a particular split that delivers accurate results but also pushes the switchover point to the Payne-Hanek method as high as possible.
When
USE_FMA = 1is specified, the output of the test app, when compiled with a high-quality math library, should look similar to this:With
USE_FMA = 0the accuracy changes slightly for the worse:The
diffsumoutput is a rough indicator of overall accuracy, here showing that about 90% of all inputs result in a correctly rounded single-precision response.Note that it is important to compile the code with the strictest floating-point settings and highest degree of adherence to IEEE-754 the compiler offers. For the Intel compiler that I used to develop and test this code, that can be achieved by compiling with
/fp:strict. Also, the quality of the math library used for reference is crucial for accurate assessment of the ulp error of this single-precision code. The Intel compiler comes with a math library that provides double-precision elementary math functions with just slightly over 0.5 ulp error in the HA (high accuracy) variant. Use of a multi-precision reference library may be preferable but would have slowed me down too much here.