Final edit:
Some users on the silverfrost forums directed me very helpfully, to a simplification of the code and a solution.
The issue can be replicated using the following code:
PROGRAM ML14ERROR
INTEGER :: origzn, destzn
INTEGER,PARAMETER :: MXZMA = 1713, LXTZN = 1714, MXAV = 182
INTEGER,PARAMETER :: JTMPREL = 1003, av = 1
REAL(KIND=2) :: RANDOM@
REAL,dimension (1:mxav,lxtzn,lxtzn,JTMPREL:JTMPREL):: znzndaav
DO origzn=1,lxtzn
DO destzn=1,lxtzn
znzndaav(av,origzn,destzn,JTMPREL) = RANDOM@()
END DO
END DO
DO origzn=1,mxzma
DO destzn=1,mxzma
! This is where the error occurs
znzndaav(av,origzn,lxtzn,JTMPREL)=
$ znzndaav(av,origzn,lxtzn,JTMPREL)+
$ znzndaav(av,origzn,destzn,JTMPREL)
ENDDO
ENDDO
WRITE(6,*)'No errors'
END PROGRAM
The issue only arises when MXAV>182, which suggests a memory issue. Indeed, multiplying out the dimensions: 183 * 1714 * 1714 * 4 yields >2GB, exceeding the stack size.
A solution would be to use the heap as follows (Fortan 95):
PROGRAM ML14ERROR
INTEGER :: origzn, destzn
INTEGER,PARAMETER :: MXZMA = 1713, LXTZN = 1714, MXAV = 191
INTEGER,PARAMETER :: JTMPREL = 1003, av = 1
REAL(KIND=2) :: RANDOM@
REAL,allocatable :: znzndaav(:,:,:,:)
ALLOCATE( znzndaav(1:mxav,lxtzn,lxtzn,JTMPREL:JTMPREL) )
DO origzn=1,lxtzn
DO destzn=1,lxtzn
znzndaav(av,origzn,destzn,JTMPREL) = RANDOM@()
END DO
END DO
DO origzn=1,mxzma
DO destzn=1,mxzma
! This is where the error occurs
znzndaav(av,origzn,lxtzn,JTMPREL)= &
& znzndaav(av,origzn,lxtzn,JTMPREL)+ &
& znzndaav(av,origzn,destzn,JTMPREL)
ENDDO
ENDDO
DEALLOCATE(znzndaav)
WRITE(6,*)'No errors'
END PROGRAM
Once we do this, we can allocate more than 2GB and the array works fine. The program this small section of code stems from is a few years old, and we've only just now run into the issue because a model we've built is many times larger than any before. As Fortran 77 doesn't allow ALLOCATABLE arrays, we must either reduce stack usage, or port the code - or seek another optimisation.
Edited to add:
I have now put together a git repo which contains reproducible code.
Overview
I have a program that works fine when compiled to 32-bit, but presents an access violation error when compiled and run in 64-bit.
I'm using the Silverfrost Fortran compiler, FTN95 v8.51, though this issue occurs using v8.40 and v8.50.
Sample code
! .\relocmon.inc
INTEGER JTMPREL
PARAMETER(JTMPREL=1003)
REAL znda(lxtzn,JTMPREL:JTMPREL)
REAL zndaav(1:mxav,lxtzn,JTMPREL:JTMPREL)
REAL,dimension (lxtzn,lxtzn,JTMPREL:JTMPREL) :: znznda
REAL mlrlsum(lxtzn,lxtzn)
REAL,dimension (1:mxav,lxtzn,lxtzn,JTMPREL:JTMPREL):: znzndaav
COMMON /DDMON/ znda, znznda, mlrlsum,znzndaav, zndaav
! EOF .\relocmon.inc
! .\relocmon.inc with values
INTEGER JTMPREL
PARAMETER(JTMPREL=1003)
REAL znda(1714,JTMPREL:JTMPREL)
REAL zndaav(1:191,1714,JTMPREL:JTMPREL)
REAL,dimension (1714,1714,JTMPREL:JTMPREL) :: znznda
REAL mlrlsum(1714,1714)
REAL,dimension (1:191,1714,1714,JTMPREL:JTMPREL):: znzndaav
COMMON /DDMON/ znda, znznda, mlrlsum,znzndaav, zndaav
! EOF .\relocmon.inc
! .\main.for
INCLUDE 'relocmon.inc'
REAL,save,dimension(lxtzn,lxtzn,mxav) :: ddfuncval
DO origzn=1,mxzma
IF( zonedef(origzn,JZUSE) )THEN
DO destzn=1,mxzma
IF (zonedef(destzn,JZUSE)) THEN
znznda(origzn,destzn,JTMPREL)=znda(destzn,JTMPREL)*
$ ddfuncval(origzn,destzn,av)
znznda(origzn,lxtzn,JTMPREL)=znznda(origzn,lxtzn,JTMPREL)
$ +znznda(origzn,destzn,JTMPREL)
znzndaav(av,origzn,destzn,JTMPREL)=zndaav(av,destzn,JTMPREL)*
$ ddfuncval(origzn,destzn,av)
! LINE 309 -- where error occurs
znzndaav(av,origzn,lxtzn,JTMPREL)=
$ znzndaav(av,origzn,lxtzn,JTMPREL)
$ +znzndaav(av,origzn,destzn,JTMPREL)
ENDIF
ENDDO
ENDIF
ENDDO
! EOF .\main.for
NB the function zonedef simply checks that a zone is valid for the calculation we want to undertake. This function returns a logical.
Debugging
As I mentioned initially, the 32-bit compiled version of this program works fine. When attempting to run the 64-bit version, the output of the first loop is this:
from sdbg64.exe:
Error: Access Violation reading address
0x00000002071E05A0
main.for: 309
write exception to file:
Access violation (c0000005) at address 43a1f4
Within file ml14.exe
in main in line 309, at address 2b84
RAX = 0000000000000001 RBX = 000000027fff704c RCX = 000000000285e6b8 RDX = 00000002802296cc
RBP = 0000000000400000 RSI = 000000029ba3ad6c RDI = 0000000307695374 RSP = 000000000285be70
R8 = 0000000307695374 R9 = 00000002ffff5040 R10 = 000000029ba3ad6c R11 = 000000030731f0dc
R12 = 000000027fff5584 R13 = 00000002802296cc R14 = 000000028169f3ec R15 = 0000000281660928
43a1f4) addss XMM11,[85b401b4++R14]
For the rest of this... please bear with me. I'm not a trained software engineer or fortran developer by any stretch, so I'm stabbing in the dark a little to troubleshoot.
The value for ZNZNDAAV(1,337,337,1003) is 2.241640, and this is being added to ZNZNDAAV(1,337,1714,1003). This tallies with register XMM11 as detailed in the exception output. This value is at address 000000029BA3BD60. The other value is at address 00000003071E05A0.
IIUC, in relocmon.inc we're setting COMMON /DDMON/ to contain the dimensioned array znzndaav, so if the software were working nominally, the address of the value in question would be within the /DDMON/ block. The address range for /DDMON/ is z'000000027FFF6040' - z'0000000307421150'. If my logic is correct, the violation occurs outside of this block.
It appears to me that the program is attempting to write to 00000002071E05A0 when it should be using 00000003071E05A0.
Can anyone help me determine why this would be the case? There appears to be something systematic about it - could it be mere coincidence?