so I was playing around with tables a bit and found this strange behavior:
folder1 = {1, 2, 3}
table.insert(folder1, "test") --4
folder1[5] = "test2"
print("#folder1: " .. #folder1) --yields 5
folder2 = {1, 2, 3}
table.insert(folder2, "test") --4
folder2[6] = "test2"
print("#folder2: " .. #folder2) --yields 6
folder3 = {1, 2, 3}
table.insert(folder3, "test") --4
folder3[7] = "test2"
print("#folder3: " .. #folder3) --yields 4
the first case is correct since it's an array with no gap. The third case is understandable since #table returns the last index before a nil index. But what happens in case two? I was expecting to receive the index 4 here.
Also it only seems to happen with a table.insert
before. If I remove the insert command, the output is as expected again:
folder1 = {1, 2, 3}
folder1[4] = "test2"
print("#folder1: " .. #folder1) --yields 4
folder2 = {1, 2, 3}
folder2[5] = "test2"
print("#folder2: " .. #folder2) --yields 3
folder3 = {1, 2, 3}
folder3[6] = "test2"
print("#folder3: " .. #folder3) --yields 3
Can someone explain this to me?
I was reading into the Lua documentation. the function rawlen(table)
was looking like the function that would return the true size of the table, but I figured out, that rawlen
also just returns the last index before a non-nil index.
Let's unpack what's happening here.
First example: After the operations,
folder1 = {1, 2, 3, "test", 5}
.This is a proper sequence, all entries from key
1
to5
are non-nil, so the length is5
, just as expected.Second example: After the operations,
folder2 = {1, 2, 3, "test", nil, "test2"}
. This is not a proper "sequence", it has holes. Your explanation isbut this definition is wrong;
#table
returns any indexi
such thatt[i] ~= nil
andt[i+1] == nil
. It is not guaranteed to be the last or the first index, and in practice it often won't be for performance reasons 1.In the second example, both
4
and6
are valid values for#folder2
. You are getting6
, so everything is fine here.Third example: After the operations,
folder3 = {1, 2, 3, "test", nil, nil, "test2"}
. Both4
and7
are valid values for#folder3
. The larger gap leads PUC Lua to evaluate#folder3
to7
, which has to do with the implementation details. 1Your second set of examples can be explained analogeously:
#{1, 2, 3, "test2"}
is4
, as expected#{1, 2, 3, nil, "test2"}
may be either3
or5
; you get3
, which is fine#{1, 2, 3, nil, nil, "test2"}
may be either3
or6
; you get ``, which is fine1 Lua tables are implemented using an array and a hash part. The "array" part entries won't always land in the array part, but Lua wants
{[3] = 3, [2] = 2, [1] = 1}
and{1, 2, 3}
to be the same from the programmer's point of view. Thus Lua needs to define#t
for hash tables; this means it has to find the length somehow, it can't just rely on an internal length field. Even for arrays this won't work, since the programmer may set entries tonil
in the middle of the array, and later on, at the end of the array.If Lua had to return the first or last boundary index, it would either have to do additional bookkeeping with significant overhead, or it would have to use a linear search. Both aren't viable options.
What Lua does instead is quite clever: It uses a binary search to find any boundary: If
t[i]
isnil
, you know that there will be a boundaryj < i
such thatt[j + 1]
is anil
. Ift[i]
is notnil
, you know that there will be a boundaryj >= i
such thatt[j + 1]
isnil
- you can halve the search space for the boundary in each step. To find an initial upper bound oni
, you can just check powers of twoi = 2^n
until you hit anil
. Furthermore, you can use the max safe integer as an upper bound.Here is a reimplementation of the table length operator in pure Lua I did a while ago, with some tests included to give you an idea:
Lua can get tighter upper bounds using less time since it knows the capacity of the hash and array part and can use that as an upper bound; it won't have to do the "try powers of two" trick.