I want to write a parser with Parslet in Ruby that understands a somewhat simple configuration syntax:
alpha = one
beta = two\
three
gamma = four
From the perspective of the parser, the backslash escapes the new line, so when parsed the value of beta
is twothree
. The backslash is in the configuration file though (i.e. the text above is a direct representation - it's not what you'd put inside Ruby string quote marks). In Ruby, it could be represented as "alpha = one\nbeta = two\\\nthree\ngamma = four"
.
My current attempt is fine with single-line settings, but can't handle the multi-line approach:
require "parslet"
class SettingParser < Parslet::Parser
rule(:term) { match("[a-zA-Z0-9_]").repeat(1) }
rule(:value) do
(match("[^\n]").repeat(1) >> match("[^\\\n]") >> str("\\\n")).repeat(0) >>
match("[^\n]").repeat(0)
end
rule(:space) { match("\\s").repeat(1) }
rule(:setting) do
term.as(:key) >> space.maybe >> str("=") >> space.maybe >>
value.as(:value)
end
rule(:input) { setting.repeat >> space.maybe }
root(:input)
end
I wonder if the issue is related to how Parslet parses things. Does the first part of my value rule grab as many characters as possible without caring about the context of the later parts?
Yep. Parslet rules will eagerly consume, so you need to match the escape case first, then only if that's not a match consume a non-escaped character.
This generates the following...
Here I am marking the characters that are not escaped returns... so I can transform them later... but you could just capture the whole string including them and search/replace them in post processing instead.
Anyway... You can now pull the data out of the tree with a transform.
You may need to add in some "End of line" logic. Currently I am assuming your config ends in a "\n". You can detect EOF with 'any.absent' (or just always add a '\n' to the end ;)