I'm trying to create a parser using Lark for WordPress shortcodes. The self-closing tags in the language have no differentiating features from standard opening tags, causing a fair bit of ambiguity even in perfectly valid syntax. I've got it mostly working, but am struggling matching the opening and closing tags with each other.
The following example contains a self-closing tag [a] and a tag [b][/b] with plaintext content:
[a][b] content [/b]
The relevant part of my language definition looks like this:
shortcode: shortcode_template{shortcode_name, attribute_list} | "[" shortcode_name attribute_list "]"
shortcode_template{name, attrs}: "[" name attrs "]" value "[/" name "]"
shortcode_name: /[^\[\]\<\>\&\/\s]+/
I expected that using a template for the opening/closing tag variant would do something akin to regex group logic, e.g. "[" (name) attrs "]" value "[/" $1 "]"
, but this just seems te get unpacked into "[" shortcode_name attribute_list "]" value "[/" shortcode_name "]"
, causing the above text to be parsed as:
value
shortcode
shortcode_template
shortcode_name a
attribute_list
value
shortcode
shortcode_name b
attribute_list
shortcode_name b
Is there an option to tell Lark that the name value in the opening tag should be the same as the name value in the closing tag?