Fixed Point Library OpenCL FPGA

Question

Fixed Point Library OpenCL FPGA

738 Views Asked by Lorenzo Cucurachi At 17 August 2025 at 04:02

I am trying to speed up the execution of my algorithm on FPGA. I'm trying to look for fixed math libraries with 32:32 (64) length in C code which would be easy translated to OpenCL. Is there anyone that knows a good library? I am trying to avoid using 128bit data types since they are floating point on OpenCL and I guess it won't speed up my algorithm if I have to use floating point again. Any suggestion is appreciated. If there is a guide to create a own library I'm ok with that as long as it explains it easy enough haha.

Thanks

Original Q&A

There are 2 best solutions below

**Piotr Lenarczyk** · Answer 1

I have found GPU's great only for floats. I will give you with some CUDA C++11 / C++14 tips:

-use normalized float range [-1.0,+1.0] for greatest accuracy and store normalizing value separately (acumulated double),

-if data is high range anyway (big numbers division ends with lossy normalization) normalize as median subbstraction (stored separately as uint64_t) = big numbers will be stored with smaller accuracy. One can use a trimmed mean f.e. 5% instead median,

-sort and normalize periodically,

-in 2017 use new GTX1080ti (GFLOPS/USD; GFLOPS/W) or used GTX 770,

-high-end FPGA's are great if they are used as preprocessing units after ADC's or within (high demands for low power) embeded systems (typically network switches, media processing f.e. video, realtime FFT devices, et cetera). Moreover even greatest models of these ultra low power computational devices rarely exceeds few hundreds of GFLOPS for 1500$. It is equal to brand-new, off-the-shell and majority-of-problem-solved-on-NVidia-forum GT730 4GB GDDR5 by Palit for 35$,

-get a few dozen dollars book "CUDA by examples" et al. J. Sanders, free YT course "Udacity intro to parallel programming" and great book "professional CUDA programming" et al. J. Cheng to become CUDA C++11 intermediate programmer in three, full-time months,

-make by yourself research for fixed point arithemtics intended for older sequentional CPU's to get some conclusion that there are only limited libraries for cos, squre root and other basis. More complicated functions are problematic and there is no big community support for solving errors. In the end you will find that there are no speed-ups on FPU's, or smaller than order of magnitude for such big effort (writing everything from scratch),

-buy (minimum microarchitecture Keppler) GPU (since popular GTX670) for 50$ from some not well educated teenager,

-install Ubuntu, get GNU Octave and please-cite-GNU Parallel for majority of non-GPU problem solving,

-use FPGA to develop high-end ASIC for massive production.

Post Scriptum: user #WhatsACreel from YouTube could write some fixed point functions for you- write him an email with some honest offer. On his channel he explains basis of fixed point arithemtic.

**My Name** · Answer 2

Is spite of common misconceptions about FPGAs vs. GPUs, FPGAs have shown very impressive results. More information on FP16, and INT8 can be found here: https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/wp/wp-01269-accelerating-deep-learning-with-opencl-and-intel-stratix-10-fpgas.pdf Although OpenCL is not a library based approach for FPGAs, there are plenty of examples from Altera/Intel and XILINX with different data types. https://www.altera.com/products/design-software/embedded-software-developers/opencl/developer-zone.html and https://github.com/Xilinx/SDAccel_Examples More important than data width and types are data movement and data-reuse aspects of the algorithm IMHO. How V100 got boost in performance vs. P100 - by clever scheduling, doing zero copy with hardware assist, avoiding DRAM traffic and doing tensor trasposes in hardware of GPU. https://devblogs.nvidia.com/tensor-core-ai-performance-milestones/ FPGAs are no different. To get apple-to-apple performance benchmarks one has to learn these tricks and implement them on FPGA in the OpenCL or C (HLS) code.

Fixed Point Library OpenCL FPGA

There are 2 best solutions below

Related Questions in OPENCL

Related Questions in FIXED-POINT

Trending Questions

Popular # Hahtags

Popular Questions