Using if clause inside Regular Expression

388 Views Asked by At

I am currently coding a .net windows app using vb.net.

I am trying to pass a regular expression to Regex.Match to extract certain texts from an article. How do I write an if condition within a regular expression? I read this regular expression cheat sheet, according to which a condition can be stated using <?()>, but no example was given.

For example, I have following text:

"Mary have banana. Mary have apple. Mary have NO pear."

I can use the following expression to take out (1) banana, (2) apple, and (3) NO pear:

mary have (.+?\.)+?

But if I want to extract only the fruits that mary has, namely (1) banana and (2) apple, I guess I would need to add a condition in the (.+?\.)+? part, right? How do I list the condition in a regular expression?

Please assist, thank you!

4

There are 4 best solutions below

3
On BEST ANSWER

Try this here:

Mary\shave\s(?!NO)(\S*)

You can try it online here: regexr.com?2thid

The first part is a negative lookahead assertion, that means this regex will not match if there is "Mary have NO". Otherwise it will put the word after "Mary have" into the first capturing group.

Here in the Perlretut (assuming its the same for .net) the condition part is explained, but I think my solution is simpler.

1
On

Here is a solution that you can use without the hassle of regular expressions, but I can only answer in C#

    string sentence = "Mary have banana Mary have apple Mary have NO pear";
    if (sentence.Contains("banana"))
    {
        string x= sentence.Remove(sentence.IndexOf("banana"),"banana".Length);
    }

Don't laugh XD just a speedfix. Just rinse and repeat for the rest of the items

1
On

then try using the .Split() method. the split will probably look something like thisstring

sentence = "Mary have banana Mary have apple Mary have NO pear"; 
string[] brokenUp = sentence.Split(
      new String[] 
      { 
          "first fruit as string variable", 
          "second fruit as string variable", 
          "third fruit as string variable" 
      }, 
      StringSplitOptions.None
);
string newSentence = null;
for (int i = 0; i < brokenUp.Length; i++)
{
    newSentence += brokenUp[i];
}
1
On

Others have provided solutions for your specific case, so I'll just focus on the "if clause" mentioned in the heading.

.NET supports conditionals using the following pattern.

(?(bob)[a-z]+|[0-9]+)

The regular expression will first try to match the text expression (the portion in the inner parentheses), if it matches then the over all expression will try to match using the sub expression before the pipe ([a-z]+) otherwise it will try to match using the sub expression after the pipe ([0-9]+).

Having said all that, I think the negative look ahead as suggested by stema would be a better fit for what you are trying to do.

Note: the "test" portion can also use any of the zero-width assertions such as the negative look behind.

(?(?<!\s)[a-z]+|[0-9]+)

Of-course a zero-width look ahead is redundant as the "test" expression is always considered zero-width.