I need to parse comma separated groups(enclosed in brackets) that may have internal groups inside the groups. It should only separate the outside groups.
I have a function that does this:
function lpeg.commaSplit(arg)
local P,C,V,sep = lpeg.P, lpeg.C, lpeg.V, lpeg.P(",")
local p = P{
"S";
S = lpeg.T_WSpace * C(V"Element") * (lpeg.T_WSpace * sep * lpeg.T_WSpace * C(V"Element"))^0 * lpeg.T_WSpace,
Element = (V"Group")^0 * (1 - lpeg.T_Group - sep)^0 * (V"Group" * (1 - lpeg.T_Group - sep)^0)^0 * (1 - sep)^0,
Group = lpeg.T_LGroup * ((1 - lpeg.T_Group) + V"Group")^0 * lpeg.T_RGroup
}^-1
return lpeg.match(lpeg.Ct(p), arg)
end
But the problem is to remove the extra brackets that may enclose the group.
Here is a test string:
[[a,b,[c,d]],[e,[f,g]]]
should parse to
[a,b,[c,d] & [e,[f,g]]
Notice the internal groups are left alone. A simple removal of the extra brackets on the end does not work since you'll end up with a string like a,b,[c,d]],[e,[f,g]
.
Any ideas how to modify the lpeg grammar to allow for the outside groups?
As I am not expert in making grammars in LPeg, I found this exercise interesting to do...
I couldn't manage to use your grammar, so I went ahead and made my own, with smaller chunks easier to understand and where I could put the captures I needed.
I think I got a decent empirical result. It works on your test case, I don't know if groups can be more deeply nested, etc. The post-processing of the capture is a bit ad hoc...
dumpObject is just a table dump of my own. The output of this code is as follows:
Personally, I wouldn't pollute the lpeg table with my stuff, but I kept your style here.
I hope this will be useful (or will be a starting point to make you to advance).