This WikiChip article states that Neoverse V1 has int8
instructions that allow 256 operations per CPU clock (per core, presumably):
I'm trying to understand what these instructions are. Do they take int8
input and accumulate the results in int8
's or int16
s (risking overflow or requiring saturation), or do they accumulate into int32
?
What are these instructions? Are they listed in https://developer.arm.com/documentation/dui0801/k/A64-SIMD-Vector-Instructions/ ?
smopa
for int8 and int16 types,bfmopa
for FP16 type. They are documented there.The int8 version accumulates into int32.
Unfortunately, the documentation quality is mediocre. I would recommend ARM company to look for a good technical writer to document their hardware.
Still, I think that instruction does something like following C++. Untested because I don’t have a hardware which supports that ISA.