Is it possible that a Fortran code compiled in one Windows machine with Visual studio 2019 on a 2018 Intel processor gives a slightly different result when the exe is copied to another machine (that has a 2022 Intel processor)? Could you please list the possible causes of this behavior?
1
There are 1 best solutions below
Related Questions in FLOATING-POINT
- Imprecision in float integers in C
- printf floating-point output variations only with alpine docker on Windows
- Is it possible to represent -3/32 as a binary floating-point value using only 7 bits
- Pytorch sum problem (possibly floating point)
- Example of Code with and without strictfp Modifier
- Why does numpy's `2**np.array([64])` produces 0, whereas plain python's 2**64 gives the correct result?
- How does floating-point addition work in "np.finfo(np.float64).max + 1"?
- Problem caused by FP16 group quantization on vit-tiny
- How to format float to omit zeros at the end of the fraction
- TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe' again
- Why wont variables in the list print to 3 decimal places?
- How to print all the decimals of a float 128 to the console
- How to specify a float/decimal value for a column inside an insert in liquibase changelog?
- Why does gcc -O1 affects std::rint()?
- Sign of result of addition in floating point arithmetic
Related Questions in FORTRAN
- Writing code for River Section Coordinates in Fortran
- How to get size of array directly from Fortran to R?
- OpenMP multi-threading not working if OpenMPI set to use one or two MPI processor
- Problems with converting old Fortran code
- Why is my code crashing with "Illegal address during kernel execution" when I use subroutines to copy or deallocate members of derived type variables?
- Getting distances of points in 2D space in an array in Fortran using the concept of broadcasting (Python)
- Conflict between lexer rules in ANTLR4 for Fortran grammar
- how to automatically generate a Module from a project in Visual Studio
- I am encountering an f2py dimension error when passing numpy array to fortran
- Why is there an error with gfortran but ifort and pgf90 do not?
- Parallelize nested loop with running sum in Fortran
- Calling c++ from fortran
- Visual Studio Code Not Recognizing findent Installation for Modern Fortran Extension Despite Correct Python Interpreter and PATH Configuration
- Different floating point representations of identical 64 bit double precision numbers
- How to define a Function in Fortran
Related Questions in INTEL
- What is the parameter for CLI YOLOv8 predict to use Intel GPU?
- Optimizing Memory-Bound Loop with Indirect Prefetching
- How can I set an uncommon screen resolution on GNU/Linux with an Arc 380 GPU and X11?
- How does CPU tell between MMIO(Memory Mapped IO) and normal memory access in x86 architecture
- Using CUDA with an intel gpu
- Having issue with CPU boosting on AMD
- Do all OpenCL drivers come with the IntelOneAPI compiler
- CL_DEVICE_NOT_AVAILABLE using Intel(R)Xeon(R)Gold 6240 CPU
- Can I launch a SGX enclave without Internet?
- Intel OneApi Vtune profiler not supporting my microarchitecture
- ModuleNotFoundError: No module named 'intel_extension_for_pytorch'
- What is the microcode scoreboard?
- Why does the assembly after my sys_clone call affect the cloned process?
- Why does mov fail to set dynamic section sizes when used on a function using GCC
- weird error happened when ran fpga program
Related Questions in INTEL-FORTRAN
- how to automatically generate a Module from a project in Visual Studio
- I'm struggling with the coarray implementation in Intel Fortran Compiler (IFX) 2024 and Visual Studio 17 (Windows 11):
- Different floating point representations of identical 64 bit double precision numbers
- How to set and access a custom pre-processor variable?
- Dynamic loader missing symbol abort trap signal in Fortran runtime
- Outout is affected for some weird silly reaons (print command as well as silly declaration of array)
- Why Intel Fortran + Intel MPI report warn(error) when using MPI_Bcast?
- why is there "WindowsSdkDir" not found warnings when I run ifort compiler from cmd prompt in visual studio
- Why is Compaq Visual Fortran (CVF) running under WinXP so much faster than gfortran or Intel ifort on Win10Pro 64?
- ifort command-line vs Visual Studio (assumed shape array)
- IFORT/IFX differences in (ENx.y) output?
- Fortan subroutine warning in Abaqus (warning LNK4210: .CRT section exists; there may be unhandled static initializers or terminators)
- Calling fortran executable file from matlab is much slower
- How do I integrate old Fortran/VS program into the new products?
- Valgrind complains reading from a file
Related Questions in FAST-MATH
- Vector by Scalar Division with -ffast-math
- C++ Eigen::inverse function renders totally wrong array with ffast-math option
- Which specific optimization flag causes libm functions to be treated as pure?
- How can I find math operations that would be optimized by `-ffast-math`?
- Why can't the Rust compiler auto-vectorize this FP dot product implementation?
- Fortran code compiled in one Windows machine (2018 Intel processor) gives different results when exe is copied to other machine (2022 Intel processor)
- Weird LTO behavior with -ffast-math
- C++ gcc does associative-math flag disable float NAN values?
- Why does -fno-signed-zeros have an effect on vectorization for minimum search?
- Is there a way to tell the compiler to compile with "fast-math" or something similar in C#?
- Where is the source of imprecise calculation in the assembler code of gcc -Ofast compared with -O3?
- Can this piece of code be modified such that it works with fast-math enabled?
- What does the "denormal input" exactly mean in assembly when we consider using DAZ flag for SSE Floating Points
- __host__ __device__ functions calling overloaded functions
- Can I determine at compile time whether --use_fast_math was set?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Run-time dispatch to different versions of functions based on CPU instruction sets, and/or auto-parallelization to all cores, are likely candidates.
Some compiler optimizations pretend that FP math is associative. In reality, FP math is not quite associative; different temporaries introduces different rounding error.
Different choices in how to find parallelism (like auto-vectorization) in loops like a dot-product or sum of an array can lead to different rounding, which can make the result numerically more accurate, but different from the source order of operations1. Such loops can only be vectorized by pretending FP math is associative.
Auto-parallelization (e.g. OpenMP) with a different number of cores could break a problem up in a different way, into different sub-totals. If your program uses all cores, that's a likely candidate.
Intel compilers can also make code that dispatches to different versions of a function depending on which SIMD instruction sets are available. So you could have an SSE4 version, an AVX2+FMA version, and an AVX-512 version (perhaps even using 512-bit vectors.)
Different SIMD widths lead to a different number of accumulators, if the loop uses the same number of vector registers. So that's a different sets of numbers getting added together into subtotals, e.g. for a dot product.
Does only one of the processors have AVX-512? Or is one of them a Pentium or Celeron without AVX? If so, that's also a likely factor.
Runtime-dispatch in library functions like TBB or SVML could also be a factor, not just code directly generated by the compiler.
Intel compiler FP-math options
Intel compilers default to
-fp-model=fast. See the docs for Intel Fortran and Intel C/C++ compiler (both classic, not LLVM-based OneAPI, although presumably they turn on-ffast-mathby default in that). The C++ compiler docs seem more detailed in their descriptions.(The other mainstream compilers, like LLVM and GCC (
gfortran), default to-fno-fast-math. But if you use OpenMP, you can let the compiler treat the sum or product or whatever in a specific reduction loop as associative.)Specifically the Intel default is
fast=1, and there's an even more aggressive level of optimization,-fp-model=fast=2that's not on by default.See also Intel's 2018 article Consistency of Floating-Point Results using the Intel® Compiler or Why doesn’t my application always give the same answer?, covering the kinds of value-unsafe optimization Intel's compilers do and how the various FP-model switches affect that. And/or slides from a 2008 talk of the same title.
Quoting some descriptions from that article:
-fp-model=consistent- not necessarily the same order of operations as the source, but the same on every machine. And run-to-run consistency when auto-parallelizing with OpenMP. Otherwise with dynamic scheduling of a reduction, the way subtotals are added could depend on which threads finished which work first.-fp-model=precise- allows value-safe optimizations only (but still including contraction ofa*b+cinto an FMA).-fp-model=strict- enables access to the FPU environment (e.g. you can change the FP rounding mode and compiler optimizations will respect that possibilitiy).disables floating-point contractions such as fused multiply-add (fma) instructions implies “precise” and “except
This part probably doesn't lead to optimizations that vary across machines, but it interesting:
Footnote 1: different isn't always worse, numerically
Floating point math always has rounding error (except in rare cases, e.g. adding 5.0 + 3.0 or other cases where the mantissa have enough trailing zeros so no
1bits have to get rounded away after shifting the mantissa to align them). The number you'd get from doing FP math operations in source order with source-specified precision will have rounding error.Using multiple accumulators (from vectorizing and unrolling a reduction) is usually better numerically, a step in the direction of pairwise summation; a simple
sum += a[i]is the worst way to add an array if the elements are all or mostly positive and of uniform size. Adding a small number to a large number loses a lot of precision, so having 16 or 32 different buckets to sum into means the totals aren't quite as big until you add the buckets together. See also Simd matmul program gives different numerical results.You can jokingly call this wrong because it makes it harder to verify / test that a program does exactly what you think it does, and because it's not what the source says to do.
But unless you're doing stuff like Kahan summation (error-compensation) or double-double (extended precision) or other numerical techniques that depend on precise FP rounding semantics, fast-math optimizations merely give you answers that are wrong in a different way than the source, and mathematically maybe less wrong.
(Unless it also introduces some approximations like
rsqrtps+ a Newton iteration instead ofx / sqrt(y). That may only be on withfast=2, which isn't the default. But I'm not sure. Some optimizations also might not care about turning-0.0into+0.0.)