I'm writing a macOS x86_64 application that cannot currently be compiled as a universal binary due to library dependencies that are x86_64 only.

This app needs to interact with other processes that are running as arm64 on Apple Silicon macs, and in doing so it needs to get the value of the system clock in raw ticks, as would be returned by mach_absolute_time().

However, on an Apple Silicon mac, mach_absolute_time() has different behaviors when called from a native arm64 app and when called from an x86_64 app under Rosetta 2.

On an Intel mac, mach_absolute_time() returns the system clock in nanoseconds, and mach_timebase_info() returns a 1:1 ratio of nanoseconds to clock ticks.

On an Apple Silicon mac, the unit of the system clock is no longer in nanoseconds, and consequently mach_timebase_info() does not return a ratio of 1:1. (On my M1 Mac Mini, I get a ratio of 125:3.)

However, an x86_64 app running under Rosetta 2 will get the same values it would've gotten on an Intel processor, which is a ratio of 1:1 and mach_absolute_time() returns a value in nanoseconds.

This is a problem for me because I need my x86_64 app to get the real value of mach_absolute_time() as though it were called from an arm64 process.

So far, I haven't found a way to do this. Every clock-related function I'm aware of returns the "fake" values when called under Rosetta 2. The only solution I can think of is to bundle an executable that's a universal binary into my app, call it from my app, have it get the timebase info running natively as arm64, and pass the value back. But that's quite a bit more heavy of a solution than I'd like.

Is there a way to get the real system clock timebase in an x86_64 app running under Rosetta 2?

2

There are 2 best solutions below

5
On BEST ANSWER

As you've observed, the system clock on Apple Silicon uses one tick per 41.667 nanoseconds (125/3) compared to the 1 tick per nanosecond on x86 architectures. And for compatibility, Rosetta uses the old 1:1 value.

While investigating this mismatch to solve a different problem I found this blog post which describes the mismatch in detail. The author has published a free utility Mints which allows investigation of this mismatch. I just tried it out on my M2 Max, both as a native app and under Rosetta, getting these outputs:

Running native on Apple Silicon:

Timebase numerator = 125
       denominator = 3
            factor = 41.666666666666664
Mach Absolute Time (raw)  = 42216039415365
Mach Absolute Time (corr) = 1759001642306875

And a short time later on Rosetta:

Running as Intel code:

Timebase numerator = 1
       denominator = 1
            factor = 1.0
Mach Absolute Time (raw)  = 1759003916431000
Mach Absolute Time (corr) = 1759003916431000

The TL;dr of this comparison is that both Mach Absolute Times start at 0, so they are always at a constant ratio to each other. To get the arm64 MAT from the x86/Rosetta MAT, you simply divide by 125/3 (multiply by 3/125), at least on M1 and M2.

To future-proof your code in case Apple changes it again, you should properly determine the ratio programmatically. On arm64 you can retrieve it, as you've indicated, from the structure returned by mach_timebase_info().

Given that you can determine the ratio more accurately on arm64, I'd recommend converting all your values to nanoseconds to match the x86 output. This is the simplest approach as you can simply get the mach_timebase_info() ratio once at startup, and then always multiply it by your mach_absolute_time() values.

Perhaps confirming this suggestion, the documentation for mach_absolute_time suggests a nanosecond approach:

Prefer to use the equivalent clock_gettime_nsec_np(CLOCK_UPTIME_RAW) in nanoseconds.

This is documented on the manpage

     CLOCK_UPTIME_RAW   clock that increments monotonically, in the same man-
                        ner as CLOCK_MONOTONIC_RAW, but that does not incre-
                        ment while the system is asleep.  The returned value
                        is identical to the result of mach_absolute_time()
                        after the appropriate mach_timebase conversion is
                        applied.

One other thing to note: some data fields in macOS, notably the user and kernel per-process times, use the "native" tick value. Specifically proc_taskinfo->pti_total_user and proc_taskinfo->pti_total_system. Under Rosetta, the above ratio doesn't help resolve this disparity.

But there is another source of this offset ratio calculation that I've found, which appears robust to Rosetta, in the IO Registry. In the device tree for each CPU there is a timebase-frequency value (along with many other clock-based frequencies that match) that works out to 1000000000 on x86 and 24000000 on arm64. Since the device tree is saved at boot time, fetching it, even under Rosetta, reveals the original values.

That ratio (1000/24) is exactly equal to 125/3, so if you choose not to convert to nanoseconds as above, and you are on Rosetta, you should be able to take your arbitrary mach_absolute_time() and divide it by 1000000000/timebase-frequency to get to your desired "native" absolute time.

If you're scripting, you could fetch the value from the command line. (The bytes are little endian.) On x86:

➜  ~ ioreg -c IOPlatformDevice | grep timebase
    | |     "timebase-frequency" = <00ca9a3b>

On arm64, even with Rosetta:

➜  ~ ioreg -c IOPlatformDevice | grep timebase
    | |     "timebase-frequency" = <00366e01>

Programmatically, I've implemented this in Java using JNA here. Here is some (untested) C code that should fetch the values you need. Exception/failure handling and other languages are left as an exercise for the reader:

#include <CoreFoundation/CoreFoundation.h>
#include <IOKit/IOKitLib.h>

uint32_t timebase;
kern_return_t status;
CFDictionaryRef matching = NULL;
CFTypeRef timebaseRef = NULL;
io_iterator_t iter = 0;
io_name_t name;

matching = IOServiceMatching("IOPlatformDevice");
// if (matching == 0) { handle exception }
// this call releases matching so we don't have to
status = IOServiceGetMatchingServices(kIOMainPortDefault, matching, &iter);
// if (status != KERN_SUCCESS) { handle failure }
while ((entry = IOIteratorNext(iter)) != 0) {
    status = IORegistryEntryGetName(entry, name);
    if (status != KERN_SUCCESS) {
        IOObjectRelease(entry);
        continue;
    }
    // don't match "cpu" but match "cpu0" etc.
    if (strlen(name) > 3 && strncmp(name, "cpu", 3) == 0)) {
        break;
    }
    IOObjectRelease(entry);
}
// if (entry == 0) { handle "didn't find cpu" }
timebaseRef = IORegistryEntryCreateCFProperty(
    entry, CFSTR("timebase-frequency"), kCFAllocatorDefault, 0);
// validate data length >= 4 bytes
// size_t timebaseLength = CFDataGetLength(timebaseRef);
// if (timebaseLength < 4) { handle failure }
CFDataGetBytes(timebaseRef, CFRangeMake(0, 4), (UInt8 *) &timebase);

// timebase should now be 1000000000 on x86, 24000000 on arm64

CFRelease(timebaseRef);
IOObjectRelease(iter);
IOObjectRelease(entry);

Summary:

  • The easiest thing to do if you control both Rosetta and native processes is always convert everything to nanos; don't use mach_absolute_time(), and always use clock_gettime_nsec_np(CLOCK_UPTIME_RAW) which is consistent whether or not Rosetta is in use.
  • If you are on Rosetta and do not have control of the arm64 native processes, and must find the correction factor to convert, find the timebase-frequency from the IORegistry, and use this to apply the correction yourself (multiply by timebase-frequency/1 billion) = 24 million / 1000 million = 24/1000 = 3/125
3
On

I've found a way to do it, but it's not perfect. The following function when executed by an x86_64 app running in Rosetta will return the same time that mach_absolute_time() would for an arm64 app:

uint64_t machAbsoluteTimeFromSysctl()
{
    uint64_t vals[2];
    size_t size = sizeof(vals);
    
    if (sysctlbyname("kern.monotonicclock_usecs", &vals, &size, NULL, 0) == -1) {
        printf("Error: sysctl kern.monotonicclock_usecs failed with error: %d\n", errno);
        return mach_absolute_time();
    }
    
    return vals[1];
}

Since this is using sysctl it takes a non-trivial amount of time to get the timestamp (though still less than a millisecond), which is unfortunate for a timing function, but it seems to be good enough for my purposes.

I'm going to leave this answer unmarked as the accepted one though in case someone else can suggest a better method.