I am trying to run a regex query on Python and I have the following problem:
In french, subjects of a sentence can appear before and after the verb. For example, the sentence "she says" can be translated into "elle dit" and "dit-elle", where "elle" is "she" and "dit" is "says".
is it possible to capture only sentences containing "elle" and "dit", whether the subject "elle" is before or after the verb "dit" ? I have started with the following:
(elle).{0;10}(dit).{0;10}(elle)
But now I would like to make one of the (elle)
optional when the other has been found. The *
and +
operators does not help in this case.
You can use PyPi
regex
module that can be installed usingpip install regex
(orpip3 install regex
):See the online Python demo
The pattern may be created dynamically from variables:
Details
(?<=\b(?P<subject>il|elle)\b.{0,10})?
- an optional look back to grab a whole wordil
orelle
within 0 to 10 chars from\b(?P<predicate>dit|mange)\b
- a whole worddit
ormange
(?=.{0,10}\b(?P<subject>il|elle)\b)?
- an optional look forward to grab a whole wordil
orelle
within 0 to 10 chars from the predicate.