Understanding NASM Macro

2.7k Views Asked by At

I've come around this macro in a assembly source file and I just can't figure out how it's working.

So first I come around this function (hevc_deblock.h):

cglobal hevc_v_loop_filter_chroma_8, 3, 5, 7, pix, stride, tc, pix0, r3stride
    sub            pixq, 2
    lea       r3strideq, [3*strideq]
    mov           pix0q, pixq
    add            pixq, r3strideq
    TRANSPOSE4x8B_LOAD  PASS8ROWS(pix0q, pixq, strideq, r3strideq)
    CHROMA_DEBLOCK_BODY 8
    TRANSPOSE8x4B_STORE PASS8ROWS(pix0q, pixq, strideq, r3strideq)
    RET

So I assume that cglobal seems to do some name mangling so I look it up in the other included files in I find that macro inside the cglobal macro (x86util.asm):

%macro CAT_UNDEF 2
    %undef %1%2
%endmacro

%macro DEFINE_ARGS 0-*
    %ifdef n_arg_names
        %assign %%i 0
        %rep n_arg_names
            CAT_UNDEF arg_name %+ %%i, q
            CAT_UNDEF arg_name %+ %%i, d
            CAT_UNDEF arg_name %+ %%i, w
            CAT_UNDEF arg_name %+ %%i, h
            CAT_UNDEF arg_name %+ %%i, b
            CAT_UNDEF arg_name %+ %%i, m
            CAT_UNDEF arg_name %+ %%i, mp
            CAT_UNDEF arg_name, %%i
            %assign %%i %%i+1
        %endrep
    %endif

    %xdefine %%stack_offset stack_offset
    %undef stack_offset ; so that the current value of stack_offset doesn't get baked in by xdefine
    %assign %%i 0
    %rep %0
        %xdefine %1q r %+ %%i %+ q
        %xdefine %1d r %+ %%i %+ d
        %xdefine %1w r %+ %%i %+ w
        %xdefine %1h r %+ %%i %+ h
        %xdefine %1b r %+ %%i %+ b
        %xdefine %1m r %+ %%i %+ m
        %xdefine %1mp r %+ %%i %+ mp
        CAT_XDEFINE arg_name, %%i, %1
        %assign %%i %%i+1
        %rotate 1
    %endrep
    %xdefine stack_offset %%stack_offset
    %assign n_arg_names %0
%endmacro

It seems to do that name mangling and add the q at the end of arguments. However, I don't understand why there are several lines of %undef directives and only the variable name with the q suffix seems to be used in the function. It also seems to append a number at the end but for some reason I'm not seeing it in the other asm file.

What am I missing here?

1

There are 1 best solutions below

1
On BEST ANSWER

The DEFINE_ARGS macro defines a number single line macros the are meant to be used to refer to the arguments of the function that the cglobal macro introduces. So for example, if foo is given as the name of the first argument then DEFINE_ARGS creates the following defines:

%xdefine fooq r0q
%xdefine food r0d
%xdefine foow r0w
%xdefine fooh r0h
%xdefine foob r0b
%xdefine foom r0m
%xdefine foomp r0mp

The suffixes represent how the argument is supposed to be accessed. The first five q, d, w, h,, b suffixes indicate the size: pointer (quad-word or double-word), double-word, word, byte, and byte respectively. The h suffix indicates that byte is the high-part of 16-bit value. The m suffix accesses the argument as memory operand of unspecified size while the mp suffixes access it as memory operand of pointer size.

The rNx names that these argument macro get defined as are themselves macros. They expand to the register, or memory location for the m and mp suffixes, where the Nth argument is stored. So when building for 64-bit Windows the macros for the for the first argument are effectively:

%define r0q rcx
%define r0d ecx
%define r0w cx
%define r0h ch
%define r0b cl
%define r0m ecx
%define r0mp rcx

Note that since the Windows 64-bit calling convention passes the first argument in a register (RCX) there's no memory location that corresponds to this argument.

When building for 32-bit targets the the first argument rNx macros end getting defined like this:

%define r0q eax
%define r0d eax
%define r0w ax
%define r0h ah
%define r0b al
%define r0m [esp + stack_size + 4]
%define r0mp dword [esp + stack_size + 4]

The r0q macro in this case only accesses the 32-bit register, because the 64-bit registers aren't accessible in 32-bit code. As this the first argument is passed on the stack when following the 32-bit calling conventions, the prologue code generated by the cglobal macro loads the first argument in to EAX.

Apparently the code that you've seen that uses these argument macros only accesses pointer-sized arguments so that's why you're only seeing q suffixes.

The purpose of the %undef lines at the start of the macro DEFINE_ARGS is to undefines the argument macros the previous invocation of DEFINES_ARGS defined. Otherwise they'd remain defined in the current function. The previous function's argument names are stored in one line macros named arg_nameN.

Please don't follow the example set by the code you're reading. They essentially create a derivative and unique programming language, one that's only really understood by the authors of the macros. It's also not the most efficient way of doing things. If I were writing this code I'd use a C/C++ and its vector intrinsics. That would leave all the differences between 32-bit and 64-bit, Windows and Linux to the compiler, which could generate better code than these macros.