does ARMv8 come with NEON and VFPV3 built in?

1.1k Views Asked by At

I have been researching on how to use OpenCV optimally on a ARMv8 system.

Googling through several tutorials I see that when building OpenCV from source, many of the times the options VFPV3 or NEON are not activated.

I then got told that "Typically GCC will handle extensions that match the processor. ARMv7 had different processor versions, some had VFPV3 and NEON support, thus the flags. All ARMv8, like the Xavier AGX, have those built in so GCC is smart enough to use them/compile them when encountered."

Does this mean that it is not necessary to specify VFPV3 or NEON when building OpenCV for ARMv8 systems? Are these active by default?

1

There are 1 best solutions below

0
On

According to ARM documentation - AArch64 Floating-point and NEON:

Both floating-point and NEON are required in all standard ARMv8 implementations. However, implementations targeting specialized markets may support the following combinations:

    No NEON or floating-point.
    Full floating-point and SIMD support with exception trapping.
    Full floating-point and SIMD support without exception trapping. 

That is, if the Armv8-a implementation you are using is 'standard', and it likely is, it should support Full floating-point and SIMD, and the compiler should use them in all cases if you specify -march=armv8-a+simd.

It seems the result would be the same with gcc 10.2.0:

op.c:

double op( double value)
{
  double v3 = v1 + v2 + value;
  return v3;
}

/opt/arm/10/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc  -march=armv8 -S op.c
cat op.s
       .arch armv8-a
        .file   "op.c"
        .text
        .section        .rodata
        .align  3
        .type   v1, %object
        .size   v1, 8
v1:
        .word   0
        .word   1072693248
        .align  3
        .type   v2, %object
        .size   v2, 8
v2:
        .word   0
        .word   1073741824
        .text
        .align  2
        .global op
        .type   op, %function
op:
.LFB0:
        .cfi_startproc
        sub     sp, sp, #32
        .cfi_def_cfa_offset 32
        str     d0, [sp, 8]
        fmov    d1, 1.0e+0
        fmov    d0, 2.0e+0
        fadd    d0, d1, d0
        ldr     d1, [sp, 8]
        fadd    d0, d1, d0
        str     d0, [sp, 24]
        ldr     d0, [sp, 24]
        add     sp, sp, 32
        .cfi_def_cfa_offset 0
        ret
        .cfi_endproc
.LFE0:
        .size   op, .-op
        .ident  "GCC: (GNU Toolchain for the A-profile Architecture 10.2-2020.11 (arm-10.16)) 10.2.1 20201103"
.section        .note.GNU-stack,"",@progbits

/opt/arm/10/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc  -march=armv8-a+simd -S op.c
cat op.s
        .arch armv8-a
        .file   "a.c"
        .text
        .section        .rodata
        .align  3
        .type   v1, %object
        .size   v1, 8
v1:
        .word   0
        .word   1072693248
        .align  3
        .type   v2, %object
        .size   v2, 8
v2:
        .word   0
        .word   1073741824
        .text
        .align  2
        .global op
        .type   op, %function
op:
.LFB0:
        .cfi_startproc
        sub     sp, sp, #32
        .cfi_def_cfa_offset 32
        str     d0, [sp, 8]
        fmov    d1, 1.0e+0
        fmov    d0, 2.0e+0
        fadd    d0, d1, d0
        ldr     d1, [sp, 8]
        fadd    d0, d1, d0
        str     d0, [sp, 24]
        ldr     d0, [sp, 24]
        add     sp, sp, 32
        .cfi_def_cfa_offset 0
        ret
        .cfi_endproc
.LFE0:
        .size   op, .-op
        .ident  "GCC: (GNU Toolchain for the A-profile Architecture 10.2-2020.11 (arm-10.16)) 10.2.1 20201103"
        .section        .note.GNU-stack,"",@progbits