lpeg grammar to parse comma separated groups that may have internal groups

227 Views Asked by At

I need to parse comma separated groups(enclosed in brackets) that may have internal groups inside the groups. It should only separate the outside groups.

I have a function that does this:

function lpeg.commaSplit(arg)
    local P,C,V,sep = lpeg.P, lpeg.C, lpeg.V, lpeg.P(",")
    local p = P{
        "S";
        S = lpeg.T_WSpace * C(V"Element") * (lpeg.T_WSpace * sep * lpeg.T_WSpace * C(V"Element"))^0 * lpeg.T_WSpace,
        Element = (V"Group")^0 * (1 - lpeg.T_Group - sep)^0 * (V"Group" * (1 - lpeg.T_Group - sep)^0)^0 * (1 - sep)^0,
        Group = lpeg.T_LGroup * ((1 - lpeg.T_Group) + V"Group")^0 * lpeg.T_RGroup
    }^-1
    return lpeg.match(lpeg.Ct(p), arg)

end

But the problem is to remove the extra brackets that may enclose the group.

Here is a test string:

[[a,b,[c,d]],[e,[f,g]]]

should parse to

[a,b,[c,d] & [e,[f,g]]

Notice the internal groups are left alone. A simple removal of the extra brackets on the end does not work since you'll end up with a string like a,b,[c,d]],[e,[f,g].

Any ideas how to modify the lpeg grammar to allow for the outside groups?

1

There are 1 best solutions below

0
On

As I am not expert in making grammars in LPeg, I found this exercise interesting to do...

I couldn't manage to use your grammar, so I went ahead and made my own, with smaller chunks easier to understand and where I could put the captures I needed.

I think I got a decent empirical result. It works on your test case, I don't know if groups can be more deeply nested, etc. The post-processing of the capture is a bit ad hoc...

require"lpeg"

-- Guesswork...
lpeg.T_WSpace = lpeg.P" "^0
lpeg.T_LGroup = lpeg.P"["
lpeg.T_RGroup = lpeg.P"]"
lpeg.T_Group = lpeg.S"[]"

function lpeg.commaSplit(arg)
  local P, C, Ct, V, sep = lpeg.P, lpeg.C, lpeg.Ct, lpeg.V, lpeg.P","
  local grammar =
  {
    "S";
    S = lpeg.T_WSpace * V"Group" * lpeg.T_WSpace,
    Group = Ct(lpeg.T_LGroup * C(V"Units") * lpeg.T_RGroup),
    Units = V"Unit" *
        (lpeg.T_WSpace * sep * lpeg.T_WSpace * V"Unit")^0,
    Unit = V"Element" + V"Group",
    Element = (1 - sep - lpeg.T_Group)^1,
  }
  return lpeg.match(Ct(P(grammar)^-1), arg)
end

local test = "[[a,b,[c,d]],[e,[f,g]]]"
local res = lpeg.commaSplit(test)
print(dumpObject(res))
print(res[1], res[1][1], res[1][2])
local groups = res[1]
local finalResult = {}
for n, v in ipairs(groups) do
  if type(v) == 'table' then
    finalResult[#finalResult+1] = "[" .. v[1] .. "]"
  end
end
print(dumpObject(finalResult))

dumpObject is just a table dump of my own. The output of this code is as follows:

local T =
{
  {
    "[a,b,[c,d]],[e,[f,g]]",
    {
      "a,b,[c,d]",
      {
        "c,d"
      }
    },
    {
      "e,[f,g]",
      {
        "f,g"
      }
    }
  }
}

table: 0037ED48 [a,b,[c,d]],[e,[f,g]]   table: 0037ED70

local T =
{
  "[a,b,[c,d]]",
  "[e,[f,g]]"
}

Personally, I wouldn't pollute the lpeg table with my stuff, but I kept your style here.

I hope this will be useful (or will be a starting point to make you to advance).