Regex - match '%20' in encoded tel URL

94 Views Asked by At

I have the following regex:

 (?:[\s:]|\d+(?:-|.)|^)(?(\d{3}))?[- .]?(\d{3})[- .]?(\d{4})(?=<|\s|$)

When copying the regex it looks that some escape backslashes got lost, so here is the correct regex: (?:[\s:]|\d+(?:-|\.)|^)\(?(\d{3})\)?[- \.]?(\d{3})[- \.]?(\d{4})(?=<|\s|$)

It is used in Tel Linker extension on Chrome to correctly format the telephone number links on a web page.

However, it fails to properly format a tel link when the empty spaces in the URL are encoded with '%20': e.g.:

https://someresource.com/tel:(904)%20640-8301
https://someresource.com/tel:(904)%20640%208301

Unfortunately, I have no experience in working with Regex so any help/suggestions are highly appreciated.

2

There are 2 best solutions below

2
On BEST ANSWER

As @mplungjan pointed out, your regex is invalid.

Try this:

(?:[\s:]|\d+(?:-|.)|^)\(?(\d{3})\)?(?:%20)*(?:[- .]+(?:%20)*)?(\d{3})(?:(?:%20)*[- .]*(?:%20)*)?(\d{4})(?=<|\s|$)

If you use this as a string, you'll need escape every \:

(?:[\\s:]|\\d+(?:-|.)|^)\\(?(\\d{3})\\)?(?:%20)*(?:[- .]+(?:%20)*)?(\\d{3})(?:(?:%20)*[- .]*(?:%20)*)?(\\d{4})(?=<|\\s|$)

Tested against:

https://someresource.com/tel:(904)640-8301
https://someresource.com/tel:(904)%20640%208301
https://someresource.com/tel:(904)%20640-8301
https://someresource.com/tel:(904)%20640 - 8301
https://someresource.com/tel:(904)%20640%20-%208301

Thanks @Casimir et Hippolyte for the suggestions

1
On

the regex you provided is invalid according to regex101

(?:[\s:]|\d+(?:-|.)|^)(?(\d{3}))?[- .]?(\d{3})[- .]?(\d{4})(?=<|\s|$)

without the ? in the capture group it is valid, and seems to match the right things:

(?:[\s:]|\d+(?:-|.)|^)((\d{3}))?[- .]?(\d{3})[- .]?(\d{4})(?=<|\s|$)

the [- .] is responsible for matching spaces in the tel, so replacing this with a regex that also matches %20 would solve that: (?:[- .]|%20):

(?:[\s:]|\d+(?:-|.)|^)((\d{3}))?(?:[- .]|%20)?(\d{3})(?:[- .]|%20)?(\d{4})(?=<|\s|$)