Capturing escaped multi-line syntax with Parslet and Ruby

250 Views Asked by At

I want to write a parser with Parslet in Ruby that understands a somewhat simple configuration syntax:

alpha = one
beta = two\
three
gamma = four

From the perspective of the parser, the backslash escapes the new line, so when parsed the value of beta is twothree. The backslash is in the configuration file though (i.e. the text above is a direct representation - it's not what you'd put inside Ruby string quote marks). In Ruby, it could be represented as "alpha = one\nbeta = two\\\nthree\ngamma = four".

My current attempt is fine with single-line settings, but can't handle the multi-line approach:

require "parslet"

class SettingParser < Parslet::Parser
  rule(:term) { match("[a-zA-Z0-9_]").repeat(1) }
  rule(:value) do
    (match("[^\n]").repeat(1) >> match("[^\\\n]") >> str("\\\n")).repeat(0) >>
      match("[^\n]").repeat(0)
  end
  rule(:space) { match("\\s").repeat(1) }
  rule(:setting) do
    term.as(:key) >> space.maybe >> str("=") >> space.maybe >>
      value.as(:value)
  end

  rule(:input) { setting.repeat >> space.maybe }
  root(:input)
end

I wonder if the issue is related to how Parslet parses things. Does the first part of my value rule grab as many characters as possible without caring about the context of the later parts?

2

There are 2 best solutions below

0
On

Yep. Parslet rules will eagerly consume, so you need to match the escape case first, then only if that's not a match consume a non-escaped character.

require "parslet"
require "pp"


class SettingParser < Parslet::Parser
  rule(:term) { match("[a-zA-Z0-9_]").repeat(1) }
  rule(:char) { str("\\\n") | match("[^\n]").as(:keep) }
  rule(:value) do
    char.repeat(1)
  end
  rule(:space) { match("\\s").repeat(1) }
  rule(:setting) do
    term.as(:key) >> space.maybe >> str("=") >> space.maybe >>
      value.as(:value) >> str("\n")
  end

  rule(:input) { setting.repeat.as(:settings) >> space.maybe }
  root(:input)
end

s = SettingParser.new

tree =  s.parse("alpha = one\nbeta = two\\\nthree\ngamma = four\n")
pp tree

This generates the following...

{:settings=>
  [{:key=>"alpha"@0,
    :value=>[{:keep=>"o"@8}, {:keep=>"n"@9}, {:keep=>"e"@10}]},
   {:key=>"beta"@12,
    :value=>
     [{:keep=>"t"@19},
      {:keep=>"w"@20},
      {:keep=>"o"@21},
      {:keep=>"t"@24},
      {:keep=>"h"@25},
      {:keep=>"r"@26},
      {:keep=>"e"@27},
      {:keep=>"e"@28}]},
   {:key=>"gamma"@30,
    :value=>
     [{:keep=>"f"@38}, {:keep=>"o"@39}, {:keep=>"u"@40}, {:keep=>"r"@41}]}]}

Here I am marking the characters that are not escaped returns... so I can transform them later... but you could just capture the whole string including them and search/replace them in post processing instead.

Anyway... You can now pull the data out of the tree with a transform.

class SettingTransform < Parslet::Transform
    rule(:keep => simple(:c)) {c}
    rule({:key => simple(:k), :value => sequence(:v)}) { {k => v.join} } 
    rule(:settings => subtree(:s)) {s.each_with_object({}){|p,a| a[p.keys[0]] = p.values[0]}}
end

pp SettingTransform.new.apply(tree)
# => {"alpha"@0=>"one", "beta"@12=>"twothree", "gamma"@30=>"four"}

You may need to add in some "End of line" logic. Currently I am assuming your config ends in a "\n". You can detect EOF with 'any.absent' (or just always add a '\n' to the end ;)

2
On

You need to start the rule for setting with space.

The following snippet worked for me. I have added pp and space? for better understanding

require "parslet"
require 'pp'

class SettingParser < Parslet::Parser
  rule(:term) { match("[a-zA-Z0-9_]").repeat(1) >> space? }
  rule(:value) do
    (match("[^\n]").repeat(1) >> match("[^\\\n]") >> str("\\\n")).repeat(0) >>
      match("[^\n]").repeat(0)
  end
  rule(:space) { match("\\s").repeat(1) }
  rule(:space?)     { space.maybe }
  rule(:setting) do
    space? >> term.as(:key) >> space? >> str("=") >> space? >>
      value.as(:value)
  end

  rule(:input) { setting.repeat >> space.maybe }
  root(:input)
end

str = %{
alpha = one
beta = two\
three
gamma = four
}

begin
  pp SettingParser.new.parse(str, reporter: Parslet::ErrorReporter::Deepest.new)
rescue Parslet::ParseFailed => error
  puts error.parse_failure_cause.ascii_tree
end

The output is

[{:key=>"alpha "@1, :value=>"one"@9},
 {:key=>"beta "@13, :value=>"twothree"@20},
 {:key=>"gamma "@29, :value=>"four"@37}]