I was wondering if it is possible to write a python regex to match it up with any valid English sentence which can have alphanumeric characters and special characters.
Basically, I wanted to extract some specific elements from an XML file. These specific elements will have the following form:
<p o=<Any Number>> <Any English sentence> </p>
For example:
<p o ="1"> The quick brown fox jumps over the lazy dog </p>
or
<p o ="2"> And This is a number 12.90! </p>
We can easily write regex for
<p o=<Any Number>>
and </p>
tags. But I am interested in extracting the sentences lying in between these tags by writing regex group.
Can anyone please suggest a Regex to be used for the problem above?
Also, if you can suggest a workaround approach, then it will be really helpful to me as well.
Use an XML parser like lxml, regex is not suitable for this task. Example:
You can read more about XPATH at: Xpath tutorial.