Regex filtering non-numeric chars

1.5k Views Asked by At

I have a Regex to remove non-numerical characters prior to parsing a decimal number.

I use the following code

Regex.Replace(myStr, "[^0-9.]", "");

Now this works for decimal numbers, but it removes the "sign" character, i.e. output for "A16.1" and "A-16.1" returns both "16.1"...

Using following edited version seems to work

Regex.Replace(myStr, "[^-0-9.]", "");

But being unfamiliar with Regex, can an experienced user confirm this is the right expression...?

2

There are 2 best solutions below

4
On

I suggest

 -?[0-9]+(\.[0-9]+)?

pattern, i.e. removing decimals will be

 string result = Regex.Replace(myStr, @"-?[0-9]+(\.[0-9]+)?", "");

explanation:

 -?           one or zero minus sign "-" - sign
 [0-9]+       at least one digit
 (\.[0-9]+)?  followed by one or none 
              fractional part (decimal separator and at least one digit)

In case you want to obtain (not remove) numbers, use Matches:

 string myStr = "-1,2.3.de2.43.";

 string[] numbers = Regex
   .Matches(myStr, @"-?[0-9]+(\.[0-9]+)?")
   .OfType<Match>()
   .Select(match => match.Value)
   .ToArray(); 

 // Test
 Console.Write(string.Join(Environment.NewLine, numbers));

the outcome is

 -1
 2.3
 2.43
0
On

In the expression [^-0-9.], the hyphen character has a special meaning within the square brackets... unless it comes at the very beginning or end of those square brackets. The - character here means that it accepts a range: in this case, anything between a literal 0 and a literal 9 as in 0-9.

However, when the hyphen is either first or last, it has nothing to go "from" (or "to"), so it cannot be treated as a "range" and is therefore parsed to be the - character.

I have found that being slightly more verbose and escaping the hyphen allows a user to place the hyphen anywhere within the square character group block, and not worry that it accidentally be parsed as a "range" indicator: [^\-0-9.] or [^0-9\-.] or [^0-9.\-]

What you have above works correctly because of the placement of the hyphen either at the beginning or end, where you do not need to explicitly escape the character, but it may be easier to read (and expand in the future) if you go with an escaped version so you (or other users) know that the hyphen should be used literally as a hyphen character.