Problem
The problem description is simple; I have a pile of text files, from which I wish to extract the frontmatter (described anon) alone, if it's there are all, and then stop processing the file any further.
Here's a sample valid example of a file with frontmatter; my comments (assume invisible from the file) will be in c-style comments:
/*spaces & newlines are fine*/
--- /* i.e., /^---\s*$/ */
key: value
foo: bar, zip, grump
/*
Anything can go in here, once I have this section pulled out, the yaml schema
can do the reset. All that's important to note is that this section must be
terminated explicitly with a subsequent /^---\s*$/ in order to be deemed valid.
---
Anything else can follow here, more accidental frontmatter blobs can exist,
but it should not matter since the other requirement is that the regex engine
will cease processing beyond the termination of the first match.
What I have so far, which doesn't address certain edge-cases is, using ripgrep/rg:
rg -g '!**/{node_modules,.*}/*' -g '*.md' -U '(?s)\s*^---$((?!---).*)^---$' -r '$1'
Problem with above right now is that it matches far past the first terminating --- in certain cases, for example where you have two frontmatter blobs, one after another.
Bonus Problem
- I want to know how I can do this with the standard regex engine that
rgdefaults to, but also how to do this withPCRE2(-P) - I want to know how I can have all flags embedded in the regex itself, rather than have
-Ufor multiline, using(?m)for example
Solve your main problem I believe it it is enough to make your matcher lazy.
Also, negative lookahead is redundant here (and was used a little wrong, more on this at the end).
I believe this regex should work for both pcre2 and default, since it doesn't use lookarounds. But I'm not entirely sure on default engine and
(?s).As for
-U, I believe it changes behavior of app regarding reading of the file, so it's quite unlikely that you could abandon it.Negative lookahead
It looks like you've tried to disallow any appearance of
---in matched block. If this is the case, it should be done with construction like:((?!---).)*