How to match opening and closing tags in Lark?

259 Views Asked by At

I'm trying to create a parser using Lark for WordPress shortcodes. The self-closing tags in the language have no differentiating features from standard opening tags, causing a fair bit of ambiguity even in perfectly valid syntax. I've got it mostly working, but am struggling matching the opening and closing tags with each other.

The following example contains a self-closing tag [a] and a tag [b][/b] with plaintext content:

[a][b] content [/b]

The relevant part of my language definition looks like this:

shortcode: shortcode_template{shortcode_name, attribute_list} | "[" shortcode_name attribute_list "]"
shortcode_template{name, attrs}: "[" name attrs "]" value "[/" name "]"
shortcode_name: /[^\[\]\<\>\&\/\s]+/

I expected that using a template for the opening/closing tag variant would do something akin to regex group logic, e.g. "[" (name) attrs "]" value "[/" $1 "]", but this just seems te get unpacked into "[" shortcode_name attribute_list "]" value "[/" shortcode_name "]", causing the above text to be parsed as:

value
  shortcode
    shortcode_template
      shortcode_name    a
      attribute_list
      value
        shortcode
          shortcode_name    b
          attribute_list
      shortcode_name    b

Is there an option to tell Lark that the name value in the opening tag should be the same as the name value in the closing tag?

0

There are 0 best solutions below