The Lua registry gets corrupted and stores things at 0x3FF00000?

282 Views Asked by At

So. This one is a doozy. I'm using the Lua Registry (aka LUA_REGISTRYINDEX) to store my Lua callback functions that I later want to call from C. It's pretty straightforward, here are my utility functions for storing and retrieving Lua callbacks:


bool store_function(lua_State *L, int *storage)
{
    if(*storage) {
        luaL_unref(L, LUA_REGISTRYINDEX, *storage);
    }

    if(lua_type(L, -1) == LUA_TFUNCTION)
    {
        *storage = luaL_ref(L, LUA_REGISTRYINDEX);
        return true;
    }
    else if(lua_type(L, -1) == LUA_TNIL)
    {
        *storage = 0;
        return false;
    }
    else
    {
        luaL_error(L, "Invalid function");
        return false;
    }
}
bool get_function(lua_State *L, int storage)
{
    if(storage == 0)
    {
        return false;
    }
    lua_rawgeti(L, LUA_REGISTRYINDEX, storage);
    if(lua_isnil(L, -1))
    {
        lua_pop(L, 1);
        return false;
    }
    return true;
}

(storage points to a global int where I store the index to the correct function pointer).

Now, this works great almost always. However, SOMETIMES, my callbacks get mixed up and the wrong one is called, leading to hilarious bugs.

Since it's so intermittent, it's taken months to pin down. I've added debug prints all over my own code and lua's standard library (actually luajit, but the lauxlib code is mostly copy-paste from lua 5.1's).

console with weird indexes

I started printing the indexes into the registry at which my callbacks end up, and noticed that they're usual at index 12, 13, 14, and thereabouts, EXCEPT when this bug happens, at which point it's at index 1072693248. That's 0x3FF00000 in hex, or 00111111111100000000000000000000 in binary. That can't be a coincidence.

Looking at the source for luaL_ref, it's clear that it should only use sequential integers (plus reusing old slots), and I doubt my registry contains over a billion objects.

More tracing. I added this patch to luajit:

diff --git a/src/lib_aux.c b/src/lib_aux.c
index 2682a38..71b68dc 100644
--- a/src/lib_aux.c
+++ b/src/lib_aux.c
@@ -276,6 +276,7 @@ LUALIB_API void luaL_buffinit(lua_State *L, luaL_Buffer *B)
 LUALIB_API int luaL_ref(lua_State *L, int t)
 {
   int ref;
+  const char* how = "free element";
   t = abs_index(L, t);
   if (lua_isnil(L, -1)) {
     lua_pop(L, 1);  /* remove from stack */
@@ -288,9 +289,11 @@ LUALIB_API int luaL_ref(lua_State *L, int t)
     lua_rawgeti(L, t, ref);  /* remove it from list */
     lua_rawseti(L, t, FREELIST_REF);  /* (t[FREELIST_REF] = t[ref]) */
   } else {  /* no free elements */
+    how = "created ref";
     ref = (int)lua_objlen(L, t);
     ref++;  /* create new reference */
   }
+  printf("refSetting in table %d (%s): at ref %d (%s)\n", t, t == LUA_REGISTRYINDEX ? "registry" : "user", ref, how);
   lua_rawseti(L, t, ref);
   return ref;
 }
diff --git a/src/lj_api.c b/src/lj_api.c
index d17a575..bb7b9dd 100644
--- a/src/lj_api.c
+++ b/src/lj_api.c
@@ -996,6 +996,13 @@ LUA_API void lua_rawset(lua_State *L, int idx)

 LUA_API void lua_rawseti(lua_State *L, int idx, int n)
 {
+  if (idx == -10000) {
+    printf("rawseti in registry[%d] = %d / %.2f\n", n, lua_tointeger(L, -1), lua_tonumber(L, -1));
+    if (lua_tointeger(L, -1) == 1072693248) {
+      printf("\n\noh shit!\n\n");
+      lua_assert(lua_tointeger(L, -1) != 1072693248);
+    }
+  }
   GCtab *t = tabV(index2adr(L, idx));
   TValue *dst, *src;
   api_checknelems(L, 1);

and a later version even with a stack trace print when this happens. Fast forward a month, and it triggers:

stack 2 stack 3

My interpretation is that the table's freelist index (it stores the next available-for-reuse index at index 0 of itself) ends up getting garbage, and then the whole registry just stops working.

Is it memory heap corruption? A bug in my lua? A bug in my C? A bug in ten year old lua lauxlib? I dunno.

And what is 0x3FF00000? The exponent for a double of the value 1.0000. But why is that in there?! It's also used to decode UTF-16 in cJSON, another C library in the same process.

My code:

ANY clues and guesses would be deeply appreciated.

0

There are 0 best solutions below