What does %f, %rd mean in ptx assembly

291 Views Asked by At

Hi I've new to CUDA programming. I've got this piece of assembly code from building a program with OpenCL.

I came to wonder what those numbers and characters mean. Such as %f7, %f11, %rd3, %r3, %f, %p.

I'm guessing that rd probably refers to a register? and the number is the register number?, and perhaps the percentage is just a way of writing operands to ptx command(i.e. ld.shared.f32)? If I'm correct in my guessings then what does %r3 mean is it like a different class of register? and %p and %f7 as well.

Thank you in advance.

    ld.global.f32   %f7, [%rd16];
    st.shared.f32   [%rd2], %f7;
    bar.sync    0;
    ld.shared.f32   %f8, [%rd4];
    ld.shared.f32   %f9, [%rd3];
    fma.rn.f32  %f10, %f9, %f8, %f32;
    ld.shared.f32   %f11, [%rd4+32];
    ld.shared.f32   %f12, [%rd3+4];
    fma.rn.f32  %f13, %f12, %f11, %f10;
    ld.shared.f32   %f14, [%rd4+64];
    ld.shared.f32   %f15, [%rd3+8];
    fma.rn.f32  %f16, %f15, %f14, %f13;
    ld.shared.f32   %f17, [%rd4+96];
    ld.shared.f32   %f18, [%rd3+12];
    fma.rn.f32  %f19, %f18, %f17, %f16;
    ld.shared.f32   %f20, [%rd4+128];
    ld.shared.f32   %f21, [%rd3+16];
    fma.rn.f32  %f22, %f21, %f20, %f19;
    ld.shared.f32   %f23, [%rd4+160];
    ld.shared.f32   %f24, [%rd3+20];
    fma.rn.f32  %f25, %f24, %f23, %f22;
    ld.shared.f32   %f26, [%rd4+192];
    ld.shared.f32   %f27, [%rd3+24];
    fma.rn.f32  %f28, %f27, %f26, %f25;
    ld.shared.f32   %f29, [%rd4+224];
    ld.shared.f32   %f30, [%rd3+28];
    fma.rn.f32  %f32, %f30, %f29, %f28;
    bar.sync    0;
    add.s32     %r37, %r37, 8;
    add.s32     %r36, %r36, %r11;
    add.s32     %r38, %r38, 1;
    setp.lt.s32 %p5, %r38, %r8;

[Edited]

Million Thanks to Robert Crovella for the Thorough answer! Just in case anyone's might wonder, this is the register declaration part(?) at the top of my ptx file

    .reg .pred  %p<6>;
    .reg .f32   %f<33>;
    .reg .b32   %r<39>;
    .reg .b64   %rd<19>;
    .shared .align 4 .b8 sgemm$blockA[256];
    // demoted variable
    .shared .align 4 .b8 sgemm$blockB[256];

The shared register size of 256 as I've set it to size 16 * 16.

And the specific section of the reference document is here

1

There are 1 best solutions below

0
On BEST ANSWER

PTX register naming is summarized here. PTX has a virtual register convention, meaning the registers are effectively variable names, they don't necessarily correspond to hardware registers in a physical device. Therefore, as indicated there, the actual interpretation of these requires more PTX code than the snippet you have here. (The virtual registers are formally declared before their usage.) Specifically, you would normally find a set of declarations something like this:

    .reg .pred      %p<11>;
    .reg .f32       %f<3075>;
    .reg .b32       %r<54>;
    .reg .b64       %rd<10>;

at the "top" of any complete PTX code, that would define the actual virtual register naming/definition.

But we can depend on some "conventions" that the compiler frontend typically uses to generate these virtual register names, to answer your questions, for instructional purposes, as opposed to stating actual "specification".

%rXY refers to one of these registers when used as an operand to an instruction, where XY is the register number, like 30. Subject to variations below, the r typically refers to a register that will be used to represent a 32-bit register used to hold integral, binary, or address information.

rd refers to a double register, i.e. a register pair, i.e. a 64-bit register. You'll note that the usage of rd in your code is mostly relating to addressing, so it makes sense that it is 64 bits.

f refers to a floating point register. (f would typically be used to refer to a 32-bit floating point register, whereas fd would typically be used to refer to a 64-bit floating point register.)

p refers to a predicate register. A predicate register can be thought of as holding a single boolean true/false quantity.

Yes, the number refers to a specific register (of that type).

None of this pertains to CUDA directly, it is part of PTX, which is documented here.