What factors determine the memory used in lambda functions?

1.1k Views Asked by At
=SUM(SEQUENCE(10000000))

The formula above is able to sum upto 10 million virtual array elements. We know that 10 million is the limit according to this question and answer. Now, if the same is implemented as Lambda using Lambda helper function REDUCE:

=REDUCE(,SEQUENCE(10000000),LAMBDA(a,c,a+c))

We get,

Calculation limit was reached when trying to calculate this formula

Official documentation says

This can happen in 2 cases:

  • The computation for the formula takes too long.
  • It uses too much memory.

To resolve it, use a simpler formula to reduce complexity.

So, it says the reason is space and time complexity. But what is the exact space used to throw this error? How is this determined?

In the REDUCE function above, the limit was at around 66k for a virtual array:

=REDUCE(,SEQUENCE(66660),LAMBDA(a,c,a+c))

However, if we remove the addition criteria and make it return only the current value c, the allowed virtual array size seems to increase to 190k:

=REDUCE(,SEQUENCE(190000),LAMBDA(a,c,c))

After which it throws a error. So, what factors determine the memory limit here? I think it's memory limit, because it throws the error almost within a few seconds.

3

There are 3 best solutions below

4
On

Edit 2022/10/26

Seems, Google Sheets Team has just increased the max. limit 10x times .

1999992 from 199992

My original formula supposed it would be 199992 cells, but as you see the "behind" logic changes and may also change in the future.

LAMBDA+Friends Limit

The maximum number of rows you can use in the formula (guess):

Limit = 1999992/(1 + inside_lambdas) - outside_lambdas

enter image description here

inside_lambdas and outside_lambdas are functions and parameters, each count 1:

  • + / * -
  • 5, A1, "text",
  • MOD, AVERAGE, etc.
  • {"array element"}

The limit is about cells operated by the "lambda+" formula: reduce, byrow, etc.

My tests are here:

Lambda Limits \ Sample Sheet


Steps to fix:

  1. Do Not use Lambda if possible :(
  2. Do most of the calculations outside lambda if possible
  3. Split formulas to multiple cells, having the limit in mind. Copy formulas, each one has its own limit.
  4. Ask Google to Fix this. In Sheets use the menu Help > Help Sheets Improve
  5. Write to the support if you have a paid account.

Final notes:

  • my formula for the limit is guess, and it works for my examples and tests. Please try it and comment to this answer if you find an error.

  • the formula does not answer how long variable names affect the limit (=ROWS(BYROW(SEQUENCE(99994), LAMBDA(xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, AVERAGE(xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx))))) Need more tests to figure out the correct effect on the limit. As this does not break: =ROWS(BYROW(SEQUENCE(199992), LAMBDA(xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, AVERAGE(xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)))), my suggestion is this is the max. length of the variable name, and it does not change the cells limit.

  • Google Sheets team may change the logic "behind" the formula, so all tests may appear invalid in a time.

0
On

If you're affected by this issue, you can send feedback to Google:

  1. Open a spreadsheet, preferably one where you bumped into the issue.
  2. Replace any sensitive information with anonymized but realistic-looking data. Remove any sensitive information that is not needed to reproduce the issue.
  3. Choose Help > Report a Problem or Help > Help Sheets Improve. If you are on a paid Google Workspace Domain, see Contact Google Workspace support.
  4. Explain why the calculation limit is an issue for you.
  5. Request:
    • Justice: Removing arbitrary limits on lambda functions
    • Equality: Avoiding discrimination against lambda functions
    • Transparency: Documenting the said discrimination in more clarity and detail
  6. Include a link to this Stack Overflow answer post.

Update Oct '22 (Credit to MaxMarkhov)

The limit is now 10x higher at 1.9 million 1999992. This is still less than 1/5th of 10 million virtual array limit of non-lambda formulas, but much better than before. Also non-lambda formulas's limit doesn't reduce with number of operations. But lambda helper formulas limit still does decrease with number of operations. So, even though it's 10x higher, that just means ~5 extra operations inside lambda(see table below).


A partial answer

We know for a fact, the following factors decide the calculation limit drum roll:

  • Number of operations
  • (Nested)LAMBDA() function calls

The base number for 1 operation seems to be 199992 1 2(=REDUCE(,SEQUENCE(199992),LAMBDA(a,c,c))). But for a zero-op or a no-op(=REDUCE(,SEQUENCE(10000000),LAMBDA(a,c,0))), the memory limit is much higher, but you'll still run into time limit. We also know number of operations is a factor, because

  • =REDUCE(,SEQUENCE(66664/1),LAMBDA(a,c,a+c)) fails
  • =REDUCE(,SEQUENCE(66664),LAMBDA(a,c,a+c)) works.
  • =REDUCE(,SEQUENCE(66664),LAMBDA(a,c,a+c+0)) fails

Note that the size of operands doesn't matter. If =REDUCE(,SEQUENCE(39998),LAMBDA(a,c,a+c+0)) works, =REDUCE(,SEQUENCE(39998),LAMBDA(a,c,a+c+100000)) will also work.

For each increase in number of operations inside the lambda function, the maximum allowed array size falls by 2n-1(Credit to @OlegValter for actually figuring out there's a factor multiple here):

Maximum sequence Number of operations
(inside lambda)
Reduction
(from 199992)
Formula
199992 1 1 REDUCE(,SEQUENCE(199992),LAMBDA(a,c,c))
66664 2 1/3 REDUCE(,SEQUENCE(66664),LAMBDA(a,c,a+c))
39998 3 1/5 REDUCE(,SEQUENCE(39998),LAMBDA(a,c,a+c+10000))
28570 4 1/7 REDUCE(,SEQUENCE(28570),LAMBDA(a,c,a+c+10000+0))

Operations outside the LAMBDA functions also count. For eg, =REDUCE(,SEQUENCE(199992/1),LAMBDA(a,c,c)) will fail due to extra /1 operation, but you only need to reduce the array size linearly by 1 or 2 per operation, i.e., this =REDUCE(,SEQUENCE(199990/1),LAMBDA(a,c,c)) will work3.

In addition LAMBDA function calls itself cost more. So, refactoring your code doesn't eliminate the memory limit, but reduces it furthermore. For eg, if your code uses LAMBDA(a,c,(a-1)+(a-1)), if you add another lambda like this: LAMBDA(a,c,LAMBDA(aminus,aminus+aminus)(a-1)), it errors out with much less array elements than before(~20% less). LAMBDA is much more expensive than repeating calls.

There are many other factors at play, especially with other LAMBDA functions. Google might change their mind about these arbitrary limits later. But this gives a start.


Possible workarounds:

  • LAMBDA itself isn't restricted. You can nest as much as you want to. Only LAMBDA Helper Functions are restricted. (Credit to player0)

  • Named functions which don't use LAMBDA(helper functions) themselves, aren't subjected to the same restrictions. But they're subject to maximum recursion restrictions.

  • Another workaround is to avoid using lambda as a arrayformula and use autofill or drag fill feature, by making the lambda function return only one value per function. Note that this might actually make your sheet slow. But apparently, Google is ok with that - multiple individual calls instead of a single array call. For example, I've written a permutations function here to get a list of all permutations. While it complains about "memory limit" for a array with more than 6 items, it can work easily by autofill/dragfill/copy+paste with relative ranges.

4
On

not even an answer

by brute-forcing a few ideas it looks like there are more hidden variables than previously thought. it is probably safe to say that the upper limit is a result of "running out of memory" especially when calculation time does not play any role. the thing is that there are factors even outside of LAMBDA that affect the computational capabilities of the formula. here is a brief summary of the issue in layman's terms:

WHY WERE/ARE LAMBDA'S MINIONS STUPID?!


UPDATE: limit boundaries were moved 10-fold higher, so none of the below testing formulae limits represent the actual up-to-date state, however, lambda minions are still not limitless!


let's imagine a memory buffer from the 1999 era with a limited size of 30 units that kicks in only when you use LAMBDA with friends (MAP, SCAN,BYCOL, BYROW, REDUCE, MAKEARRAY). keep in mind that in google sheets when we use any other formula, the limiting factor is usually the cell count limit.

enter image description here

example 1
output capability: 199995 cells!
reduction from 199995: 1/1
(meh, but ok)

enter image description here

example 2
output capability: 49998 cells!
reduction from 199995: 1/~4
(*double-checking the calendar if the year is really 2022*)

enter image description here

example 3
output capability: 995 cells!
reduction from 199995: 1/201 !!
(*remembering this company built a quantum computer*)

enter image description here


further testing

establishing the baseline:
all below formulae are maxed out so they work as "one step before erroring out". please keep noticing the numbers as a direct representation of row (not cell) processing abilities

starting with a simple:

=ROWS(BYROW(SEQUENCE(99994), LAMBDA(x, AVERAGE(x))))

by adding one more x the following would error out so even the length of strings matters:

=ROWS(BYROW(SEQUENCE(99994), LAMBDA(xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, AVERAGE(xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx))))

doubling the array brings no issues:

=ROWS(BYROW({SEQUENCE(99994), SEQUENCE(99994)}, LAMBDA(x, AVERAGE(x))))

but additional "stuff" will reduce the output by 1:

=ROWS(BYROW({SEQUENCE(99993), SEQUENCE(99993, 1, 5)}, LAMBDA(x, AVERAGE(x))))

interestingly this one runs with no problem so now even the complexity of input matters (?):

=ROWS(BYROW(SEQUENCE(99994, 6, 0, 5), LAMBDA(x, AVERAGE(x))))

and with this one, it seems that even choice of formula selection matters:

=ROWS(BYROW(RANDARRAY(99996, 2), LAMBDA(x, AVERAGE(x))))

but what if we move from virtual input to real input... A1 cell being set to =RANDARRAY(105000, 3) we can have:

=ROWS(BYROW(A1:B99997, LAMBDA(x, AVERAGE(x))))

again, it's not a matter of cells because even with 8 columns we can get the same:

=ROWS(BYROW(A1:H99997, LAMBDA(x, AVERAGE(x))))

not bad, however, indirecting the range will put us back to 99995:

=ROWS(BYROW(INDIRECT("A1:B"&99995), LAMBDA(x, AVERAGE(x))))

another fact is that LAMBDA as a standalone function runs flawlessly even with an array 105000×8 (that's solid 840K cells)

=LAMBDA(x, AVERAGE(x))(A1:H105000)

so is this really the memory issue of LAMBDA (?) or the factors that determine the memory used in LAMBDA are limits of unknown origin bestowed upon LAMBDA by individual incapabilities of:

  • MAP
  • SCAN
  • BYCOL
  • BYROW
  • REDUCE
  • MAKEARRAY

and their unoptimized memory demands shaken by wast variety of yet unknown variables within our spacetime