I am trying to speed up the execution of my algorithm on FPGA. I'm trying to look for fixed math libraries with 32:32 (64) length in C code which would be easy translated to OpenCL. Is there anyone that knows a good library? I am trying to avoid using 128bit data types since they are floating point on OpenCL and I guess it won't speed up my algorithm if I have to use floating point again. Any suggestion is appreciated. If there is a guide to create a own library I'm ok with that as long as it explains it easy enough haha.
Thanks
I have found GPU's great only for floats. I will give you with some CUDA C++11 / C++14 tips:
-use normalized float range [-1.0,+1.0] for greatest accuracy and store normalizing value separately (acumulated double),
-if data is high range anyway (big numbers division ends with lossy normalization) normalize as median subbstraction (stored separately as uint64_t) = big numbers will be stored with smaller accuracy. One can use a trimmed mean f.e. 5% instead median,
-sort and normalize periodically,
-in 2017 use new GTX1080ti (GFLOPS/USD; GFLOPS/W) or used GTX 770,
-high-end FPGA's are great if they are used as preprocessing units after ADC's or within (high demands for low power) embeded systems (typically network switches, media processing f.e. video, realtime FFT devices, et cetera). Moreover even greatest models of these ultra low power computational devices rarely exceeds few hundreds of GFLOPS for 1500$. It is equal to brand-new, off-the-shell and majority-of-problem-solved-on-NVidia-forum GT730 4GB GDDR5 by Palit for 35$,
-get a few dozen dollars book "CUDA by examples" et al. J. Sanders, free YT course "Udacity intro to parallel programming" and great book "professional CUDA programming" et al. J. Cheng to become CUDA C++11 intermediate programmer in three, full-time months,
-make by yourself research for fixed point arithemtics intended for older sequentional CPU's to get some conclusion that there are only limited libraries for cos, squre root and other basis. More complicated functions are problematic and there is no big community support for solving errors. In the end you will find that there are no speed-ups on FPU's, or smaller than order of magnitude for such big effort (writing everything from scratch),
-buy (minimum microarchitecture Keppler) GPU (since popular GTX670) for 50$ from some not well educated teenager,
-install Ubuntu, get GNU Octave and please-cite-GNU Parallel for majority of non-GPU problem solving,
-use FPGA to develop high-end ASIC for massive production.
Post Scriptum: user #WhatsACreel from YouTube could write some fixed point functions for you- write him an email with some honest offer. On his channel he explains basis of fixed point arithemtic.