Suppose I have the following text:
# Should match
- [ ] Some task
- [ ] Some task | [[link]]
- [ ] Some task ^abcdef
- [ ] Some task | [[link]] ^abcdef
- [ ] ! Some task
- [ ] ! Some task | [[link]]
- [ ] ! Some task ^abcdef
- [ ] ! Some task | [[link]] ^abcdef
- [ ] Task one | [ ] ! Task two | [ ] Task three ^abcdef
| Tracker | Task | Backlog |
| ----------: | :---------------------- | :------- |
| 00:00-00:00 | [ ] Task item | [[linK]] |
| 00:00-00:00 | [ ] Task item ^abcdef | [[link]] |
| 00:00-00:00 | [ ] [[task-item]] | [[link]] |
| 00:00-00:00 | [ ] ! Task item | [[linK]] |
| 00:00-00:00 | [ ] ! Task item ^abcdef | [[link]] |
| 00:00-00:00 | [ ] ! [[task-item]] | [[link]] |
# Should not match
- [ ]
- [ ]
- [ ]
- [ ] !
- [ ] !
- [ ] !
| Tracker | Task | Backlog |
| ----------: | :---------------------- | :------- |
| 00:00-00:00 | [ ] | [[linK]] |
| 00:00-00:00 | [ ] ! | [[linK]] |
I am interested in several capture groups as follows:
group
$1:- match:
[and]
- match:
group
$2:- match: any single character (e.g.,
\s) between[and]
- match: any single character (e.g.,
group
$3:- match:
!,?, or*that follows after[ ]
- match:
group
$4:- match: task text after
[ ]without modifier present
- match: task text after
group
$5:- match: task text after
[ ] !with modifier present
- match: task text after
I came up with the following regex (i.e., see demo here):
(?<= \s )
# Match opening braket (i.e., `[`).
( \[ )
# Match any single character (e.g., `x`).
( . )
# Matching closing braket (i.e., `]`)
( \] )
(?= \s* [?!*]? \s* )
# Exclude entries without text (i.e., incl. in tables).
(?!
\s* [?!*]? \s* \|
|
\s* [?!*]? \s* $
)
# Match the text (i.e., capture based on modifier presence).
(?:
# Match modifier (i.e., `!`, `?`, or `*`) and the text that follows.
\s* ( [!?*] ) \s* ( .*? )
|
# Match the text that does not follow a modifier.
\s* (?! [!?*]) \s* ( .*? )
)
# Match until either of the stops that follow are met.
(?= \s+ \^[a-z0-9]{6,} | \s+ \| | \s*$)
Which seems to work (i.e., see the picture below), with one exception. The [ and ] notation brackets are captured in separate groups (i.e., [ in the group $1 and ] in the group $3). How can I capture [ and ] as part of the same group (i.e., $1)?
I am using this regex for a TextMate grammar in VS Code and according to the documentation the expression needs to be a valid Oniguruma regular expression. Based on some attempts, I noticed that the following are not supported:
- branch resets (i.e.,
\K) - capturing inside lookarounds
- named capture groups
Edit
The fourth bird indicated in the comments that with the /J flag enabled the regex below works (i.e., see demo):
(?<= \s )
# Match opening braket (i.e., `[`).
(?<g1> \[)
# Match any single character (e.g., `x`).
(?<g2> .)
# Matching closing braket (i.e., `]`)
(?<g1> \])
(?= \s* [?!*]? \s* )
# Exclude entries without text (i.e., incl. in tables).
(?!
\s* [?!*]? \s* \|
|
\s* [?!*]? \s* $
)
# Match the text (i.e., capture based on modifier presence).
(?:
# Match modifier (i.e., `!`, `?`, or `*`) and the text that follows.
\s* (?<g3>[!?*]) \s* (?<g4>.*?)
|
# Match the text that does not follow a modifier.
\s* (?! [!?*]) \s* (?<g5>.*?)
)
# Match until either of the stops that follow are met.
(?= \s+ \^[a-z0-9]{6,} | \s+ \| | \s*$)
It does. However, as I just discovered, it seems that I cannot use named capture groups for TextMate grammars and, therefore, I need a different solution.
