Lua split string considering nil entry

86 Views Asked by At
str = "cat,dog,,horse"
for word in string.gmatch(str, "([^,'',%s]+)") do
    print(word)
end

This code outputs the following.

cat
dog
horse

I want to consider nil entry as well and want to have the following output.

cat
dog
nil
horse

How can this be done? Could someone please point out?

2

There are 2 best solutions below

0
Luatic On BEST ANSWER

A few things:

  • nil ~= "". You probably want the empty string rather than nil here. It is however trivial to convert one into the other, so I'll be using the empty string in the following code.
  • You don't need the parentheses around the gmatch pattern. If there are no "captures" (parentheses), the entire pattern is implicitly captured.
  • I'm rather confused about the intent of your pattern. You're matching sequences of one or more non-(whitespace, comma, or single quote) characters; that is, you're splitting on all of whitespace, commata, and single quotes. For some reason, you also have ' and , twice in the character class; just once suffices. I'll be assuming you want to split by ,.

The issue is that currently your pattern uses the + (one or more) quantifier when you want * (zero or more). Just using * works completely fine on Lua 5.4:

Lua 5.4.4  Copyright (C) 1994-2022 Lua.org, PUC-Rio
> local str = "cat,dog,,horse"; for word in str:gmatch"[^,]*" do print(word) end
cat
dog

horse

However, there is an issue when you try to run that same code on LuaJIT: It will produce seemingly random empty strings rather than only producing an empty string for two consecutive delimiters (this could be seen as "technically correct" since the empty string is a match for *, but I see it as a violation of the greediness of *). One solution is to require each match to end with a delimiter, appending a delimiter, and matching everything but the delimiter:

LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
JIT: ON SSE2 SSE3 SSE4.1 AMD BMI2 fold cse dce fwd dse narrow loop abc sink fuse
> local str = "cat,dog,,horse"; for word in (str .. ","):gmatch("(.-),") do print(word) end
cat
dog

horse

A third option would be to split manually using repeated calls to string.find. Here's the utility I wrote myself for that:

function spliterator(str, delim, plain)
    assert(delim ~= "")
    local last_delim_end = 0

    -- Iterator of possibly empty substrings between two matches of the delimiter
    -- To exclude empty strings, filter the iterator or use `:gmatch"[...]+"` instead
    return function()
        if not last_delim_end then
            return
        end

        local delim_start, delim_end = str:find(delim, last_delim_end + 1, plain)
        local substr
        if delim_start then
            substr = str:sub(last_delim_end + 1, delim_start - 1)
        else
            substr = str:sub(last_delim_end + 1)
        end
        last_delim_end = delim_end
        return substr
    end
end

The usage in this example would be

for word in spliterator("cat,dog,,horse", ",") do print(word) end

Whether you want to add this to the string table, keep it in a local variable or perhaps a required string util module is up to you.

0
dt192 On

I would do this

  str = "cat,dog,,horse"
  for word in string.gmatch(str..',', "([^,]*),") do
    print(word == '' and 'nil' or word)
  end