Regex to match specific URL fragment and not all other URL possibilities

1.9k Views Asked by At

I have - let say - example.com website and there I have account page. It may have GET parameters, which is also considered part of account page. It also may have URL fragment. If it's home.html fragment - it is still the account page. And if another fragment - then it's a different sub-page of the account page.

So - I need a RegEx (JS) to match this case. This is what I managed to build so far:

example.com\/account\/(|.*\#home\.html|(\?(?!.*#.*)))$

https://regex101.com/r/ihjCIg/1

The first 4 are the cases I need. And as you see - the second row is not matched by my RegEx.

What am I missing here?

3

There are 3 best solutions below

0
On
example\.com\/account\/((\??[^#\r\n]+)?(#?home\.html)?)?$

This matches your first four strings

example.com/account/
example.com/account/?brand=mine
example.com/account/#home.html
example.com/account/?brand=mine#home.html

and excludes your last two

example.com/account/#other.html
example.com/account/?brand=mine#other.html
3
On

You could create 2 optional groups, 1 to optionally match ? and matching any char except # and another optional group matching #home.html

Note to escape the dot to match it literally.

^example\.com\/account\/(?:\?[^#\r\n]*)?(?:#home\.html)?$
  • ^ Start of string
  • example\.com\/account\/ Match start
  • (?: Non capturing group
    • \?[^#\r\n]* Match ? and 0+ times any char except # or a newline
  • )? Close group and make it optional
  • (?: Non capturing group
    • #home\.html Match #home.html
  • )? Close group and make it optional
  • $

Regex demo

let pattern = /^example\.com\/account\/(?:\?[^#\r\n]*)?(?:#home\.html)?$/;
[
  "example.com/account/",
  "example.com/account/?brand=mine",
  "example.com/account/#home.html",
  "example.com/account/?brand=mine#home.html",
  "example.com/account/#other.html",
  "example.com/account/?brand=mine#other.html"
].forEach(url => console.log(url + " --> " + pattern.test(url)));

3
On

Third alternative in your group has a negative look ahead which ensures it rejects any text that contains a # but you haven't specifically mentioned anything that should match rest of the content till end of line. Check this updated regex demo,

https://regex101.com/r/ihjCIg/3

If you notice, I have escaped your first dot just before com and have added .* after the negative look ahead part so it matches your second sample.