GtkSourceView - Gedit: regex matching discrepancy

216 Views Asked by At

I try to use this RegEx ^(\[\^)([^\]\s\p{C}]+)(\])\ (\P{C}*(?:\n(?!\n[\[\n])\P{C}*)*) in dar.lang (located at /usr/share/gtksourceview-3.0/language-specs) to capture footnotes with multiple paragraphs (that i.a. terminate with 2 empty lines, i.e. 3 \n).

When I open Gedit I see that this RegEx in dar.lang highlights only the first paragraph of the [^1]... footnote (ignoring the last two):

enter image description here

However when I use the very same RegEx in the Gedit's "Find" dialog (need to press the magnifying glass icon and activate "Match as Regular Expression") then it matches perfectly:

enter image description here

I get the same results even if I replace both occurrences of \P{C}* with .* in the RegEx...

How should I change dar.lang so it also captures footnotes with multiple paragraphs the same way Gedit's Find does? (regex101 behaves correctly just as Gedit's Find dialog).

Update:

It looks like somebody had a similar problem. A (working) recommendation was to match empty line ^$ instead of \n, because GtkSourceView's syntax engine is line based (i.e. "regexes only have access to one line of text at a time").

However I also saw people saying that the syntax engine uses PCRE and once I run the syntax.dar file through grep --perl-regexp:

wget https://gitlab.com/pninim.org/dar/dar-syntax/-/raw/master/syntax.dar
grep --perl-regexp '^(\[\^)([^\]\s\p{C}]+)(\])\ (\P{C}*(?:\n(?!\n[\[\n])\P{C}*)*)' syntax.dar

I got this:

[^a] Sentence of a long margin note's body that refers to the word "paragraph".
[^1] Sentence of the first paragraph of the footnote. Sentence of the first paragraph of the footnote. Sentence of the first paragraph of the footnote. Sentence of the first paragraph of the footnote.
[^b] Footnote at the end of the file. Footnote at the end of the file. Footnote at the end of the file.

So on the other hand it looks like standard PCRE behavior...

0

There are 0 best solutions below