I've come around this macro in a assembly source file and I just can't figure out how it's working.
So first I come around this function (hevc_deblock.h):
cglobal hevc_v_loop_filter_chroma_8, 3, 5, 7, pix, stride, tc, pix0, r3stride
sub pixq, 2
lea r3strideq, [3*strideq]
mov pix0q, pixq
add pixq, r3strideq
TRANSPOSE4x8B_LOAD PASS8ROWS(pix0q, pixq, strideq, r3strideq)
CHROMA_DEBLOCK_BODY 8
TRANSPOSE8x4B_STORE PASS8ROWS(pix0q, pixq, strideq, r3strideq)
RET
So I assume that cglobal
seems to do some name mangling so I look it up in the other included files in I find that macro inside the cglobal
macro (x86util.asm):
%macro CAT_UNDEF 2
%undef %1%2
%endmacro
%macro DEFINE_ARGS 0-*
%ifdef n_arg_names
%assign %%i 0
%rep n_arg_names
CAT_UNDEF arg_name %+ %%i, q
CAT_UNDEF arg_name %+ %%i, d
CAT_UNDEF arg_name %+ %%i, w
CAT_UNDEF arg_name %+ %%i, h
CAT_UNDEF arg_name %+ %%i, b
CAT_UNDEF arg_name %+ %%i, m
CAT_UNDEF arg_name %+ %%i, mp
CAT_UNDEF arg_name, %%i
%assign %%i %%i+1
%endrep
%endif
%xdefine %%stack_offset stack_offset
%undef stack_offset ; so that the current value of stack_offset doesn't get baked in by xdefine
%assign %%i 0
%rep %0
%xdefine %1q r %+ %%i %+ q
%xdefine %1d r %+ %%i %+ d
%xdefine %1w r %+ %%i %+ w
%xdefine %1h r %+ %%i %+ h
%xdefine %1b r %+ %%i %+ b
%xdefine %1m r %+ %%i %+ m
%xdefine %1mp r %+ %%i %+ mp
CAT_XDEFINE arg_name, %%i, %1
%assign %%i %%i+1
%rotate 1
%endrep
%xdefine stack_offset %%stack_offset
%assign n_arg_names %0
%endmacro
It seems to do that name mangling and add the q
at the end of arguments. However, I don't understand why there are several lines of %undef
directives and only the variable name with the q
suffix seems to be used in the function. It also seems to append a number at the end but for some reason I'm not seeing it in the other asm file.
What am I missing here?
The DEFINE_ARGS macro defines a number single line macros the are meant to be used to refer to the arguments of the function that the cglobal macro introduces. So for example, if
foo
is given as the name of the first argument then DEFINE_ARGS creates the following defines:The suffixes represent how the argument is supposed to be accessed. The first five
q
,d
,w
,h,
,b
suffixes indicate the size: pointer (quad-word or double-word), double-word, word, byte, and byte respectively. Theh
suffix indicates that byte is the high-part of 16-bit value. Them
suffix accesses the argument as memory operand of unspecified size while themp
suffixes access it as memory operand of pointer size.The
rNx
names that these argument macro get defined as are themselves macros. They expand to the register, or memory location for them
andmp
suffixes, where the Nth argument is stored. So when building for 64-bit Windows the macros for the for the first argument are effectively:Note that since the Windows 64-bit calling convention passes the first argument in a register (RCX) there's no memory location that corresponds to this argument.
When building for 32-bit targets the the first argument
rNx
macros end getting defined like this:The
r0q
macro in this case only accesses the 32-bit register, because the 64-bit registers aren't accessible in 32-bit code. As this the first argument is passed on the stack when following the 32-bit calling conventions, the prologue code generated by the cglobal macro loads the first argument in to EAX.Apparently the code that you've seen that uses these argument macros only accesses pointer-sized arguments so that's why you're only seeing
q
suffixes.The purpose of the %undef lines at the start of the macro DEFINE_ARGS is to undefines the argument macros the previous invocation of DEFINES_ARGS defined. Otherwise they'd remain defined in the current function. The previous function's argument names are stored in one line macros named arg_nameN.
Please don't follow the example set by the code you're reading. They essentially create a derivative and unique programming language, one that's only really understood by the authors of the macros. It's also not the most efficient way of doing things. If I were writing this code I'd use a C/C++ and its vector intrinsics. That would leave all the differences between 32-bit and 64-bit, Windows and Linux to the compiler, which could generate better code than these macros.