What regular expression will match a string of letter Us and O so that at most one letter U is adjacent to a letter O?

We wish to match strings of text such as OOOOUUUU and UUUUOOO.

However, the Us and Os should not touch each-other much, as in OUOUUUOOO

STRING IS_MATCH
empty string no
U yes
O yes
UUUU yes
OOOO yes
OU yes
UO, yes
UUUOOOO yes
OOOOUUU yes
OOOUUUUOOUOU no
OUUUOOOUUUO no
OUOUOUOUOUOUOU no

The letters O and U should appear consecutively, at most once.


Suppose that AS is a collection of string literals such that for any string A in set AS, we have that our ideal regex R matches string literal A.

AS = {"U", "O", "OU", "UO", "UOO", "OOU", "UUO", "OOU", ...}

Let A be a string in collection AS.

It follows that if UO is a sub-sequence of string A then OU is NOT a subsequence of string A.

Additionally, if OU is a sub-sequence of string A then UO is NOT a subsequence of string A.


2

There are 2 best solutions below

0
Bohemian On

Use an alternation for O's then any U's, or visa versa:

^(O+U*|U+O*)$

See live demo.

2
Cary Swoveland On

My understanding is that we wish to confirm that a given string is valid, namely it:

  • contains only "O"'s and "U"'s; and
  • may not contain both substrings "OU" and "UO".

One way to do that is to employ a negative lookahead (provided that feature is supported by the regex engine being used):

^(?!.*(?:OU+O|UO+U))[OU]+$

Demo

This expression can be broken down as follows.

^            match beginning of string
(?!          begin a negative lookahead
  .*         match >= 0 characters other than line terminators
  (?:        begin a non-capture group
    OU+O     match two 'O's separate by one or more 'U's
    |        or
    UO+U     match two 'U's separate by one or more 'O's
  )          end non-capture group
)            end negative lookahead
[OU]+        match >= 1 characters in character class
$            match end of string

One can also hover the cursor over each part of this expression at this link to obtain an explanation of its function.