SVE offers various gather-load intrinsics. For instance, svuint32_t m = svld1_gather_u32_offset_u32(svbool_t pg, const uint32_t *base, svuint32_t offsets)
loads base[i]
into each lane i
of m
.
Alternatively, you can use svuint32_t k = svld1_gather_u32base_u32(svbool_t pg, svuint32_t bases)
to load elements into k
. Each lane of bases contains a 32 bit memory address as bases is of type svuint32_t
. However, AArch64 (required for SVE) pointers are 64 bits. How can a 64 bit pointer fit into a 32 bit lane?
I assume the second variant can only be used when the pointers happen to fit into 32 bits. As this can't be guaranteed, how practical is the second approach?
I attempted to read the generated assembly code, but since it's not my strength, it didn't help me further.