To which cache a function pointer belongs to?

610 Views Asked by At

In C, if I have a function pointer

int (*f_ptr) (int)

it will be in the instruction cache or in the data cache ? I wouldn't be surprised to find f_ptr in either of those caches. There is a way to debug this under linux maybe with perf, kinda like a bird eye view of the data cache, instruction cache and translation lookaside buffer ?

3

There are 3 best solutions below

0
On

The code of the function will enter the instruction cache when the function is executed.

But I assume you are talking about the function pointer variable itself. Since it's a variable, it's going to end up in the data cache. It's only a variable containing an address. A pointer in short.

3
On

EDIT

"It depends" is the answer. First and foremost some caches have I and D combined, some have them separate, some are configurable.

The bottom line is a data access with caching enabled is a data access for the data cache, an instruction fetch with caching enabled is connected to the instruction cache.

The manipulation of the function pointer unless optimized into a register is is a data access for the data cache. Executing the function requires that address so either it is optimized into a register and no access is required, or a data access is required to get the address and then that address is called/branched to which results in the instructions at that address to be candidates for the instruction cache.

Not uncommon for function pointers to be used for loadable modules or an abstraction layer, where one time you change the address of the functions to point at the loaded module, etc then many times you simply call those functions. Depending on your system and application the time between calls to that function may be such that other data accesses evict that function pointer address to L2 or L3 ... and eventually main/slow ram, and in that case it is no longer in any cache. When eventually the call happens then the data access happens and the address is read and lands in the layers of caches it passes through (however many you have).

Some simple examples:

int (*f_ptr) (int);

int fun ( int x )
{
    return(f_ptr(5)+x);
}

One compiler produces:

00000000 <fun>:
   0:   e59f3018    ldr r3, [pc, #24]   ; 20 <fun+0x20>
   4:   e92d4010    push    {r4, lr}
   8:   e5933000    ldr r3, [r3]
   c:   e1a04000    mov r4, r0
  10:   e3a00005    mov r0, #5
  14:   e12fff33    blx r3
  18:   e0800004    add r0, r0, r4
  1c:   e8bd8010    pop {r4, pc}
  20:   00000000    andeq   r0, r0, r0

and as expected the read of the address to the pointer is a data access

 0: e59f3018    ldr r3, [pc, #24]   ; 20 <fun+0x20>

also as not only expected but required the call

14: e12fff33 blx r3

would be an instruction fetch at that address. On an arm which the above example is the L1 can/is combined I and D, so had the write to that address been recent enough and not evicted then it would have pulled it from the cache. otherwise l2 or l3 ... or main/slow ram

Now do something like this:

int (*f_ptr) (int);

int ext_fun ( int );

int more_fun ( int y )
{
    return(ext_fun(y));
}

int fun ( int x )
{
    f_ptr = more_fun;
    return(f_ptr(5)+x);
}

which gives

00000000 <more_fun>:
   0:   eafffffe    b   0 <ext_fun>

00000004 <fun>:
   4:   e59f301c    ldr r3, [pc, #28]   ; 28 <fun+0x24>
   8:   e59f201c    ldr r2, [pc, #28]   ; 2c <fun+0x28>
   c:   e92d4010    push    {r4, lr}
  10:   e1a04000    mov r4, r0
  14:   e3a00005    mov r0, #5
  18:   e5832000    str r2, [r3]
  1c:   ebfffffe    bl  0 <ext_fun>
  20:   e0840000    add r0, r4, r0
  24:   e8bd8010    pop {r4, pc}
    ...

so the address for f_ptr is stored but it is not read nor used in fact.

do this

int (*f_ptr) (int);

int more_fun ( int y )
{
    return(y+7);
}

int fun ( int x )
{
    f_ptr = more_fun;
    return(f_ptr(5)+x);
}

and

00000000 <more_fun>:
   0:   e2800007    add r0, r0, #7
   4:   e12fff1e    bx  lr

00000008 <fun>:
   8:   e59f300c    ldr r3, [pc, #12]   ; 1c <fun+0x14>
   c:   e59f200c    ldr r2, [pc, #12]   ; 20 <fun+0x18>
  10:   e280000c    add r0, r0, #12
  14:   e5832000    str r2, [r3]
  18:   e12fff1e    bx  lr
    ...

and the store happens but there is no function call at all

intersting:

static int (*f_ptr) (int);

static int more_fun ( int y )
{
    return(y+7);
}

int fun ( int x )
{
    f_ptr = more_fun;
    return(f_ptr(5)+x);
}

the compiler missed an opportunity

00000000 <more_fun>:
   0:   e2800007    add r0, r0, #7
   4:   e12fff1e    bx  lr

00000008 <fun>:
   8:   e59f300c    ldr r3, [pc, #12]   ; 1c <fun+0x14>
   c:   e59f200c    ldr r2, [pc, #12]   ; 20 <fun+0x18>
  10:   e280000c    add r0, r0, #12
  14:   e5832000    str r2, [r3]
  18:   e12fff1e    bx  lr
    ...

Could have probably gotten away with

  10:   e280000c    add r0, r0, #12
  18:   e12fff1e    bx  lr

as the reset is dead code.

so switch compilers and the opportunity was taken

fun:                                    @ @fun
        .fnstart
.Leh_func_begin0:
@ BB#0:                                 @ %entry
        add     r0, r0, #12
        bx      lr

neither was there a data write nor read with respect to the address of f_ptr. so you cannot universally say that f_ptr is in cache at all.

the poster tagged more than one instruction set, and perhaps within those we can probably agree that there will be a data instruction to get the function pointer address then a call/branch to that address would fetch instructions AT that address. But there are instruction sets that have double indirect data instructions, there perhaps are those with jump table instructions, and yes the read of the address is a data operation true, but is that how that processor would work? esp if it is a harvard architecture perhaps and the jump table instruction is relative to the instruction location (guaranteed to be right next to the instruction in .text, would be inefficient to re-read the same space you may have already have fetched).

short answer, the instructions for the function are instruction fetches and would go through the i cache. Manipulation of the function pointer including the read of it to perform the call is most likely a data access and if cached in the d cache. Unless someone thinks of an instruction in some instruction set the f_ptr address will be in the data cache, to be in i cache would have to be a special instruction for some specific processor and the compiler will have to have used that instruction (which leads to a race condition between manipulation of the address and the use of the address which the compiler would have to cause a flush which it would probably rather implement a data read then call than use the special instruction).

EDIT 2

A simple example of the address being optimized into a register

static int (*f_ptr) (int);

int fun ( int x )
{
    f_ptr = (void *)0x1000;
    return(f_ptr(5)+x);
}

gives

fun:                                    @ @fun
    .fnstart
.Leh_func_begin0:
@ BB#0:                                 @ %entry
    .save   {r4, r10, r11, lr}
    push    {r4, r10, r11, lr}
    .setfp  r11, sp, #8
    add r11, sp, #8
    mov r4, r0
    mov r1, #4096
    mov r0, #5
    mov lr, pc
    bx  r1
    add r0, r0, r4
    pop {r4, r10, r11, lr}
    bx  lr

no data accesses at all.

0
On

The I$ (instruction-cache) is only used by the instruction-fetch logic in the front-end of the CPU. Memory operands for mov and any other instruction go in the L1 D$.

Copying a block of code with memcpy would leave (some of) it in the L1 D$ (and L2 / L3 caches). Jumping to it wouldn't even look in the D$: The instruction-fetch pipeline would start fetching it into L1 I$. Fortunately, L2 and L3 are unified (not split into code/data), so the instruction fetch would hit in L2. (Unless the block of code was so big that the start got evicted by the time the memcpy was done.)

Split (I$ / D$) L1, and unified other levels is a pretty universal choice for CPU designs, not just Intel / AMD x86 CPUs.

To actually answer the question, a function pointer stores an address. This will NEVER be stored in I$. The memory it points to will be fetched into the I$ if you call the pointed-to function (setting the CPU's instruction pointer to that value with a call instruction).