How do I force a page to generate a pagefault on next access?

216 Views Asked by At

I am trying to develop a routine using SVE. SVE provides fault-avoiding loads which do not load from memory that would lead to a fault if accessed. As the CPU does not know the reason why a page is unmapped or inaccessible, it cannot distinguish between memory that would trigger an invalid page fault and memory that would trigger a major/minor page fault (which is usually transparent to the application).

SVE code using these instructions must thus be prepared for stray faults to be indicated and must retry the loads using non-faulting instructions if it requires the data after all. For example, consider a routine operating on NUL-terminated strings. A first-fault load is used to load a chunk of the string. If a fault was avoided only after a NUL character, everything is fine. But if that happened before a NUL character, we must retry the load with conventional load instructions as the string has been proven to cross into the faulting page.

If such “retry on avoided fault” paths are present in the code, they must be tested. However, it does not seem obvious to me how to prepare a page to fault (with major or minor page fault) on next access. If an all-zero page is acceptable, a possibility seems to be to just map a fresh anonymous page and make use of the kernel's lazy page allocation. However, this is not guaranteed or documented to cause the desired effect.

For arbitrary pages, the madvise system call has the MADV_PAGEOUT option, which seems like it might give the desired effect, but the man page does not document if the effect takes place immediately, and states that it might not affect certain pages. It is also unclear if the call works in the absence of swap space. Success/failure does not seem to be reported unambigously so it's unclear if a unit test can rely on this call. It would be quite bad for the unit test to silently pass due to the page not actually being unmapped when it runs.

What is the recommended course of action?

Also interested in responses for other operating systems (such as FreeBSD) and in particular also hardware-specific approaches, that may or may not be specific to ARM.

2

There are 2 best solutions below

0
On BEST ANSWER

The approach I ended up using was to allocate a page without read/write permissions. On first access, a SIGSEGV would occur, ensuring that the access failed. In the signal handler, I then permit access to the page and return, resuming the code under test.

While the performance of this code is likely worse than that of the approach suggested by Michael Karcher, the code clearly and obviously reaches the desired result which is more important to me.

Here is some example code, testing a custom strtol implementation:

#include <signal.h>
#include <sys/mman.h>
#include <sys/param.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#ifndef PAGE_SIZE
#define PAGE_SIZE 16384
#endif

long mystrtol(const char *restrict, char **restrict, int);

/* a signal handler that makes testpage writable and then returns */
static void *testpage;
static void
maptestpage(int sig)
{
    mprotect(testpage, PAGE_SIZE, PROT_READ|PROT_WRITE);
}

/*
 * Call mystrtol() on the given input with a page fault after the given
 * number of characters.  Print an error if the return value is not
 * equal to what strtol() says it should be.
 */
static void
test_mystrtol(const char *str, size_t off)
{
    struct sigaction sa;
    long num;
    int res;
    char *data, *cpy, *endptr;

    data = mmap(NULL, 2*PAGE_SIZE, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, -1, 0);
    if (data == MAP_FAILED) {
        perror("mmap");
        return;
    }

    cpy = data + PAGE_SIZE - off;
    strcpy(cpy, str);
    mprotect(data + PAGE_SIZE, PAGE_SIZE, PROT_NONE);
    testpage = data + PAGE_SIZE;

    sa.sa_handler = maptestpage;
    sa.sa_flags = SA_RESETHAND;
    sigfillset(&sa.sa_mask);
    res = sigaction(SIGSEGV, &sa, NULL);
    if (res != 0) {
        perror("sigaction");
        goto end;
    }

    num = mystrtol(cpy, &endptr, 10);
    signal(SIGSEGV, SIG_DFL);

    if (num != ...) {
        /* ... */
    }

end:    munmap(data, 2*PAGE_SIZE);
}
3
On

I tried the following code, both locally in WSL2 (which basically is a real Linux kernel running in a Hyper-V VM) and on godbolt.org:

#include <sys/mman.h>
#include <sys/fcntl.h>
#include <unistd.h>
#include <time.h>
#include <stdio.h>

int main(void)
{
    struct timespec tsa, tsb;
    int my_fd = open("/proc/self/exe", O_RDONLY);
    char* mapping = mmap(NULL, 8192, PROT_READ, MAP_SHARED, my_fd, 0);
    volatile int x = mapping[0] + mapping[4096];
    munmap(mapping + 4096, 4096);
    mmap(mapping + 4096, 4096, PROT_READ, MAP_SHARED | MAP_ANON, 0, 0);
    volatile int y = mapping[4096];
    munmap(mapping + 4096, 4096);
    mmap(mapping + 4096, 4095, PROT_READ, MAP_SHARED, my_fd, 4096);
    clock_gettime(CLOCK_MONOTONIC, &tsa);  // prefetch clock_gettime
    clock_gettime(CLOCK_MONOTONIC, &tsa);
    volatile int x2 = mapping[0];
    clock_gettime(CLOCK_MONOTONIC, &tsb);
    printf("first page (no fault): %d ns\n", (tsb.tv_nsec + 1000000000 - tsa.tv_nsec) % 999999999);
    clock_gettime(CLOCK_MONOTONIC, &tsa);
    volatile int y2 = mapping[4096];
    clock_gettime(CLOCK_MONOTONIC, &tsb);
    printf("second page (fault?): %d ns\n", (tsb.tv_nsec + 1000000000 - tsa.tv_nsec) % 999999999);
}

The goal is to create a two-page mapping, and prime the filesystem cache (which likely is hot anyway on /proc/self/exe), then it will replace the second page by something else and restore the self-mapping. The first access to the second page is very slow, indicating that the re-mapping of the filesystem cache happens on first use in the page-fault handler, not already on the mmap call.

The output at godbolt looks something like this:

first page (no fault): 76 ns
second page (fault?): 1032 ns