Decode Regex expression - ^[a-zA-Z0-9 "'&!#$%()*+,-./:;?@[\\]^_`{|}~]+$

4k Views Asked by At

Using Java

I am not a regular on regex, I came across the following regex as part of migration of springmodules-validation stuff to latest.

^[a-zA-Z0-9 "'&!#$%()*+,-./:;?@[\\]^_`{|}~]+$

What exactly is this doing? I need to understand this to write unit test to this validation. By the way I'm using it in a Java project.

One more interesting thing, I tried this expression in hibernate-validator as follows:

@Pattern(regexp = "^[a-zA-Z0-9 "'&!#$%()*+,-./:;?@[\\]^_`{|}~]+$")

Then my intellijIDEA shows an error at the end of the line saying Unclosed character class. is the regex expression is properly formed?

Update

It seems the expression is malformed, I see the following exception while trying to test this:

java.util.regex.PatternSyntaxException: Unclosed character class near index 57
^[a-zA-Z0-9 "'&!#$%()*+,-./:;?@[\]^_`{|}~]+$

Here is the original expression from one of the xml files which I'm trying to migrate:

<regexp apply-if="creativeType == 'Text'" expression="^[a-zA-Z0-9 

&quot;&apos;&amp;!#$%()*+,-./:;?@[\\]^_`{|}~]+$"/>

Am I missing anything?

Working Solution

regexp = "^[a-zA-Z0-9 \"'&!#$%()*+,-./:;?@\\[\\]^_`{|}~]+$"

this way I have assigned it to a string and which works perfectly for me Thank you all!

2

There are 2 best solutions below

4
On BEST ANSWER

The translated expression would look something like

^[a-zA-Z0-9 "'&!#$%()*+,-./:;?@\[\]^_`{|}~]+$

and means a line of letter, digits and a set of other characters (like different brackets, where ] has to be escaped for not meaning the end of a character class).

3
On

You can use something like YAPE::Regex::Explain in Perl or RegexBuddy to get a detailed description of your regular expression. A messy one-liner can be found below:

perl -MYAPE::Regex::Explain -e \
'$e=<>; print YAPE::Regex::Explain->new($e)->explain';

After providing the regexp from stdin:

The regular expression:

^[a-zA-Z0-9 "'&!#$%()*+,-./:;?@[\]^_`{|}~]+$

matches as follows:

NODE                       EXPLANATION
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  ^                        the beginning of the string
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  [a-zA-Z0-9               any character of: 'a' to 'z', 'A' to 'Z',
  "'&!#$%()*+,-             '0' to '9', ' ', '"', ''', '&', '!', '#',
  ./:;?@[\]^_`{|}~]+       '$', '%', '(', ')', '*', '+', ',' to '.',
                           '/', ':', ';', '?', '@', '[', '\]', '^',
                           '_', '`', '{', '|', '}', '~' (1 or more
                           times (matching the most amount possible))
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  $                        before an optional \n, and the end of the
                           string
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Using something like Regex Buddy will let you select a Java flavor for your regular expression, but it should be pretty standard in this case.

Are you sure this is Java though? From all that escaping it looks a lot more like it's part of a XSD / XPath / XML thing.