Lua : copying a table efficiently (deep copy)

5.5k Views Asked by At

I try to make efficiently a copy of a lua table. I have written the following function copyTable() that works well (see below). But I imagined I could have something more efficient using the "passing by value" mechanism of the functions. I made a few tests to explore this mechanism :

function nop(x)
  return x
end

function noop(x)
  x={}
  return x
end

function nooop(x)
  x[#x+1]=4
  return x
end

function copyTable(datatable)
  local tblRes={}
  if type(datatable)=="table" then
    for k,v in pairs(datatable) do tblRes[k]=copyTable(v) end
  else
    tblRes=datatable
  end
  return tblRes
end

tab={1,2,3}
print(tab)            -->table: 0x1d387e0 tab={1,2,3}
print(nop(tab))       -->table: 0x1d387e0 tab={1,2,3}
print(noop(tab))      -->table: 0x1e76f90 tab={1,2,3}
print(nooop(tab))     -->table: 0x1d387e0 tab={1,2,3,4}
print(tab)            -->table: 0x1d387e0 tab={1,2,3,4}
print(copyTable(tab)) -->table: 0x1d388d0

We can see that the reference to the table is transferred unchanged through the functions (when I just read it or add things) except within noop() where I try a radical modification of the existing.

I read Bas Bossink and the answer made by Michael Anderson in this Q/A. Regarding the passing or tables as arguments, they emphasized the difference between "arguments passed by ref" and "arguments passed by values and tables are references" with examples where this difference appears.

But what does that mean precisely ? Do we have a copy of the reference, but what difference does that make with a passing through ref since the data pointed and therefore manipulated is still the same, not copied ? Is the mechanism in noop() specific when we try to affect nil to the table, specific to avoid the deletion of the table or in which cases does it trigger (we can see with nooop() that it is not always the case when the table is modified) ?

My question : how the mechanism of passing tables really works ? Is there a way to make a more efficient way to copy the data of a table without the burden of my copyTable ?

2

There are 2 best solutions below

2
On BEST ANSWER

The rules of argument passing in Lua is similarly to C: everything is pass by value, but tables and userdata are passed around as pointers. Passing a copy of a reference does not appear so different in usage, but it is completely different than passing by reference.

For example, you brought this part up specifically.

function noop(x)
  x={}
  return x
end
print(noop(tab))      -->table: 0x1e76f90 tab={1, 2, 3}

You are assigning the value for the new table[1] into variable x (x now holds a new pointer value). You didn't mutate the original table, the tab variable still holds the pointer value to the original table. When you return from noop you are passing back the value of the new table, which is empty. Variables hold values, and a pointer is a value, not a reference.

Edit:

Missed your other question. No, if you want to deep-copy a table, a function similar to what you wrote is the only way. Deep copies are very slow when tables get large. To avoid performance issues, you might use a mechanism like "rewind tables", which keep track of changes made to them so they can be undone at later points in time (very useful in recursive with backtrack contexts). Or if you just need to keep users from screwing with table internals, write a "freezable" trait.

[1] Imagine the {} syntax is a function that constructs a new table and returns a pointer to the new table.

1
On

If you are sure that those 3 assumptions (A) are valid for "tab" (the table being copied):

  1. There are no table keys

    t1 = {}
    tab = {}
    tab[t1] = value
    
  2. There are no repeated table values

    t1 = {}
    tab = {}
    tab.a = t1
    tab.b = t1
    -- or
    -- tab.a.b...x = t1
    
  3. There are no recursive tables:

    tab = {}
    tab.a = tab
    -- or
    -- tab.a.b...x = tab
    

Then the code you provided is the smallest and almost as efficient as possible.

If A1 doesn't hold (i.e. you have table keys), then you must change your code to:

function copyTable(datatable)
  local tblRes={}
  if type(datatable)=="table" then
    for k,v in pairs(datatable) do 
      tblRes[copyTable(k)] = copyTable(v) 
    end
  else
    tblRes=datatable
  end
  return tblRes
end

If A2 doesn't hold (i.e. you have repeated table values), then you could change your code to:

function copyTable(datatable, cache)
  cache = cache or {}
  local tblRes={}
  if type(datatable)=="table" then
    if cache[datatable] then return cache[datatable]
    for k,v in pairs(datatable) do 
      tblRes[copyTable(k, cache)] = copyTable(v, cache) 
    end
    cache[datatable] = tblRes
  else
    tblRes=datatable
  end
  return tblRes
end

This approach only pays off, though, if you have lots of repeated large tables. So, it is a matter of evaluating which version is faster for your actual production scenario.

If A3 doesn't hold (i.e. you have recursive tables), then your code (and both adjustments above) will enter an infinite recursive loop and eventually throw a stack overflow.

The simplest way to handle that is keeping a backtrack and throwing an error if table recursion happens:

function copyTable(datatable, cache, parents)
  cache = cache or {}
  parents = parents or {}
  local tblRes={}
  if type(datatable)=="table" then
    if cache[datatable] then return cache[datatable]
    assert(not parents[datatable])
    parents[datatable] = true
    for k,v in pairs(datatable) do 
      tblRes[copyTable(k, cache, parents)]
        = copyTable(v, cache, parents) 
    end
    parents[datatable] = false
    cache[datatable] = tblRes
  else
    tblRes=datatable
  end
  return tblRes
end

My solution for a deepcopy function which handles recursive tables, preserving the original structure may be found here: https://gist.github.com/cpeosphoros/0aa286c6b39c1e452d9aa15d7537ac95