Can I determine at compile time whether --use_fast_math was set?

1.1k Views Asked by At

I'm writing some CUDA code, and I want it to behave differently based on whether or not --use_fast_math was set or not. And - I want to make that decision at compile time, not at run time.

It seems that NVCC does not add or change a preprocessor define when --use_fast_math is set. I checked this by comparing the output of:

nvcc -Xcompiler -dM -E -x cu -

with the output of

nvcc -Xcompiler -dM -E --use_fast_math -x cu -

and they're exactly the same; so that avenue seems to be blocked. Now, if the compiling user would invoke NVCC with --use_fast_math -DUSING_FAST_MATH then I could also detect that; but suppose it's library code and we can't impose these restrictions on the user.

Is there some other way for code undergoing compilation to notice that --use_fast_math is on?

Note: "Noticing" can mean using preprocessor #if or #ifdef directives, using SFINAE, using compiler-builtin values or constexpr functions - whatever is available at compile time.

2

There are 2 best solutions below

0
talonmies On

The answer is almost certainly no. The fast math functions are hardware instructions and they are substituted by code generation within the CUDA device code compiler. An example:

$ cat nonsense.cu

__global__ void kernel(float* in, float* out)
{
    int idx = threadIdx.x + blockIdx.x * blockDim.x;

    out[idx] = sqrtf(cosf(in[idx]));
}

$ nvcc -v -arch=sm_60 --keep -c --use_fast_math nonsense.cu 
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/opt/cuda-10.1/bin
#$ _THERE_=/opt/cuda-10.1/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/opt/cuda-10.1/bin/..
#$ NVVMIR_LIBRARY_DIR=/opt/cuda-10.1/bin/../nvvm/libdevice
#$ LD_LIBRARY_PATH=/opt/cuda-10.1/bin/../lib:/opt/cuda-10.1/lib64
#$ PATH=/opt/cuda-10.1/bin/../nvvm/bin:/opt/cuda-10.1/bin:/opt/miniconda3/bin:/opt/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/cuda-10.1/bin
#$ INCLUDES="-I/opt/cuda-10.1/bin/../targets/x86_64-linux/include"  
#$ LIBRARIES=  "-L/opt/cuda-10.1/bin/../targets/x86_64-linux/lib/stubs" "-L/opt/cuda-10.1/bin/../targets/x86_64-linux/lib"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ gcc -std=c++14 -D__CUDA_ARCH__=600 -E -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__  "-I/opt/cuda-10.1/bin/../targets/x86_64-linux/include"    -D__CUDACC_VER_MAJOR__=10 -D__CUDACC_VER_MINOR__=1 -D__CUDACC_VER_BUILD__=105 -include "cuda_runtime.h" -m64 "nonsense.cu" > "nonsense.cpp1.ii" 
#$ cicc --c++14 --gnu_version=70400 --allow_managed   -arch compute_60 -m64 -ftz=1 -prec_div=0 -prec_sqrt=0 -fmad=1 -fast-math --gen_div_approx_ftz --include_file_name "nonsense.fatbin.c" -tused -nvvmir-library "/opt/cuda-10.1/bin/../nvvm/libdevice/libdevice.10.bc" --gen_module_id_file --module_id_file_name "nonsense.module_id" --orig_src_file_name "nonsense.cu" --gen_c_file_name "nonsense.cudafe1.c" --stub_file_name "nonsense.cudafe1.stub.c" --gen_device_file_name "nonsense.cudafe1.gpu"  "nonsense.cpp1.ii" -o "nonsense.ptx"
#$ ptxas -arch=sm_60 -m64  "nonsense.ptx"  -o "nonsense.sm_60.cubin" 
#$ fatbinary --create="nonsense.fatbin" -64 "--image=profile=sm_60,file=nonsense.sm_60.cubin" "--image=profile=compute_60,file=nonsense.ptx" --embedded-fatbin="nonsense.fatbin.c" 
#$ gcc -std=c++14 -E -x c++ -D__CUDACC__ -D__NVCC__  "-I/opt/cuda-10.1/bin/../targets/x86_64-linux/include"    -D__CUDACC_VER_MAJOR__=10 -D__CUDACC_VER_MINOR__=1 -D__CUDACC_VER_BUILD__=105 -include "cuda_runtime.h" -m64 "nonsense.cu" > "nonsense.cpp4.ii" 
#$ cudafe++ --c++14 --gnu_version=70400 --allow_managed  --m64 --parse_templates --gen_c_file_name "nonsense.cudafe1.cpp" --stub_file_name "nonsense.cudafe1.stub.c" --module_id_file_name "nonsense.module_id" "nonsense.cpp4.ii" 
#$ gcc -std=c++14 -D__CUDA_ARCH__=600 -c -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS "-I/opt/cuda-10.1/bin/../targets/x86_64-linux/include"   -m64 -o "nonsense.o" "nonsense.cudafe1.cpp" 

$ cat nonsense.ptx
//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-25769353
// Cuda compilation tools, release 10.1, V10.1.105
// Based on LLVM 3.4svn
//

.version 6.4
.target sm_60
.address_size 64

    // .globl   _Z6kernelPfS_

.visible .entry _Z6kernelPfS_(
    .param .u64 _Z6kernelPfS__param_0,
    .param .u64 _Z6kernelPfS__param_1
)
{
    .reg .f32   %f<4>;
    .reg .b32   %r<5>;
    .reg .b64   %rd<8>;


    ld.param.u64    %rd1, [_Z6kernelPfS__param_0];
    ld.param.u64    %rd2, [_Z6kernelPfS__param_1];
    cvta.to.global.u64  %rd3, %rd2;
    cvta.to.global.u64  %rd4, %rd1;
    mov.u32     %r1, %tid.x;
    mov.u32     %r2, %ctaid.x;
    mov.u32     %r3, %ntid.x;
    mad.lo.s32  %r4, %r3, %r2, %r1;
    mul.wide.s32    %rd5, %r4, 4;
    add.s64     %rd6, %rd4, %rd5;
    ld.global.f32   %f1, [%rd6];
    cos.approx.ftz.f32  %f2, %f1;
    sqrt.approx.ftz.f32     %f3, %f2;
    add.s64     %rd7, %rd3, %rd5;
    st.global.f32   [%rd7], %f3;
    ret;
}

You can see that there is no nvcc steered pre-processor magic, just arguments passed to the device compiler which has resulted in PTX code with the requisite instructions in place. This means that, in theory, you might be able to mess around with LLVM hacks to intercept or identify the bytecode you are looking for, but I very much doubt that is what you had in mind.

0
einpoklum On

Here's an ugly kludge instead of an answer:

Place something like the following in your code:

#ifndef I_ENABLED_FAST_MATH_REALLY_I_DID_PLEASE_BELIEVE_ME
#warning "Foo library is possibly being compiled without `--use_fast_math`. Please enable that switch, and define I_ENABLED_FAST_MATH_REALLY_I_DID_PLEASE_BELIEVE_ME, to avoid this message."
#endif

While this won't force your users to do what you want, it will be annoying enough to at least encourage them to do it. (Naturally they can simply disable the warning with the preprocessor definition.)