What is the most efficient way to remove rows containing any null values, for a table with many rows?
Additionally, why do I get the error
ERROR: 'time (attempt to use variable time without defining/assigning first
even though I have nothing named as ‘time’ when using the functional select
?[t;{(not;(null;x))} each cols t; 0b; ()]
given in this S/Oanswer?
Is your table in memory or splayed on disk?
If your table is in memory, then a simple form might also be the fastest.
If the table is splayed on disk, the game changes. In a splayed table each column is stored as a file. The second argument of the Select operator is a list of constraints; each one a parse tree. Whereas
null flip tabove tests every cell oft, Select applies is constraints successively: only rows that have passed previous constraints are tested for the next. If no rows satisfy the first constraint/s subsequent column files may never need to be read at all.From this follows the rule: list your constraints in descending order of restrictiveness – if nulls are more common in certain columns, test them first.
Regarding an in-memory test, note that
not anycan be a good deal faster thanall not, depending on whether nulls are common.