I try to use this RegEx ^(\[\^)([^\]\s\p{C}]+)(\])\ (\P{C}*(?:\n(?!\n[\[\n])\P{C}*)*)
in dar.lang (located at /usr/share/gtksourceview-3.0/language-specs
) to capture footnotes with multiple paragraphs (that i.a. terminate with 2 empty lines, i.e. 3 \n
).
When I open Gedit I see that this RegEx in dar.lang
highlights only the first paragraph of the [^1]... footnote (ignoring the last two):
However when I use the very same RegEx in the Gedit's "Find" dialog (need to press the magnifying glass icon and activate "Match as Regular Expression") then it matches perfectly:
I get the same results even if I replace both occurrences of \P{C}*
with .*
in the RegEx...
How should I change dar.lang
so it also captures footnotes with multiple paragraphs the same way Gedit's Find does? (regex101 behaves correctly just as Gedit's Find dialog).
Update:
It looks like somebody had a similar problem. A (working) recommendation was to match empty line ^$
instead of \n
, because GtkSourceView's syntax engine is line based (i.e. "regexes only have access to one line of text at a time").
However I also saw people saying that the syntax engine uses PCRE and once I run the syntax.dar
file through grep --perl-regexp
:
wget https://gitlab.com/pninim.org/dar/dar-syntax/-/raw/master/syntax.dar
grep --perl-regexp '^(\[\^)([^\]\s\p{C}]+)(\])\ (\P{C}*(?:\n(?!\n[\[\n])\P{C}*)*)' syntax.dar
I got this:
[^a] Sentence of a long margin note's body that refers to the word "paragraph".
[^1] Sentence of the first paragraph of the footnote. Sentence of the first paragraph of the footnote. Sentence of the first paragraph of the footnote. Sentence of the first paragraph of the footnote.
[^b] Footnote at the end of the file. Footnote at the end of the file. Footnote at the end of the file.
So on the other hand it looks like standard PCRE behavior...