Same regular expression to get 1 or 2 parts of filename

77 Views Asked by At

Can anyone help me please? It is for mod-rewrite in .htaccess

From this url file name:

car-audi-tt-2

I need to get the full string.


But if the string i get is this:

car-audi-tt-2-2019

I need to get (with the same regular expression) "car-audi-tt-2" in $1 and "2019" in $2.


Any idea?

Now i have this code, which it is working just for second case in https://www.regextester.com:

([-A-Za-z0-9\+\.]+)-(20[-0-9]{2})*
5

There are 5 best solutions below

0
MrWhite On
([-A-Za-z0-9\+\.]+)-(20[-0-9]{2})*

You need to also make the - (hyphen) delimiter (before the last "year") optional. But only optional when the "year" is present and do not include this in the capturing subpattern.

You should also anchor the regex (although this may depend on your RewriteRule directive), since you need to make the + quantifier on the first subpattern non-greedy so to not consume the now optional "-year".

[-0-9] - As I queried in comments, the first hyphen in the 2nd character class looks out of place and does not match your example. I've removed it in my solution below and in doing so simplified the regex.

There is no need to backslash-escape the literal + and . characters inside the character class.

Try the following instead:

 ^([-A-Za-z0-9+.]+?)(?:-(20\d\d))?$

The ?: prefix on the parenthesized subpattern makes it non-capturing. This is so that the nested capturing subpattern becomes the 2nd capturing group (ie. $2).

The ? quantifier makes the "-year" optional, whereas * allows for an unlimited number of repetitions (including none), which could potentially result in part of the filename being "lost" (since the backreference only contains the last match).


Aside:

I need to get (with the same regular expression)

If using mod_rewrite then you don't necessarily need to do this in a single rule/regex.

0
Artemio Ramirez On

Like this?

^(.*?)(?:-(\d{4}))?$

https://regex101.com/r/y8dB5H/1

0
Luis Colorado On

Use grouping. As in:

([-A-Za-z0-9\+\.]+)(-(20[-0-9]{2}))?

You can, after matching, get group 0 as the whole matched string, group 1 will be the first part and group 3 will be the year (without the separator dash). In case you don't have a second (then also no third group) you can assume there's no year present. You have to be carefull, as the last digit you use will be confounded with a year if it happen to be four digit and start with 20.

Don't use the closure operator, because if you use it, you will allow an unbounded number of repetitions (from 0 up to infinite) while ? only accepst 0 or 1. If you use it,

car-audi-tt-2-2016201720182019202020212022

will be accepted and your group 3 will be only the last number used (in this case 2022) There can be other issues related to possible surrounding text, that can be undesired, but selected by the regex.

0
mdromed On

Now i have this for capture "car-audi-tt-2-2019" in $1 and $2

((?:[a-zA-Z\d]*)(?:-?[a-zA-Z\d]*)*)(?:-([0-9]{4})+)

and this for capture "car-audi-tt-2" in $1

((?:[a-zA-Z\d]*)(?:-?[a-zA-Z\d]*)*)(?:-([0-9]{4})+)?

This is the best way i have achieved.

0
The fourth bird On

You could use anchors around the regex pattern with a case insensitive modifier:

(?i)^([a-z0-9]+(?:-[a-z0-9]+)*?)(?:-([0-9]{4})+)?$

The pattern matches:

  • (?i) Case insensitive modifier
  • ^ Start of string
  • ( Capture group 1
    • [a-z0-9]+ Match 1+ chars a-z or digits 0-9
    • (?:-[a-z0-9]+)*? Optionally repeat, as few as possible times, matching - and 1+ chars a-z or digits 0-9
  • ) Close group 1
  • (?:-([0-9]{4})+)? Optionally match - and capture 4 digits in group 2
  • $ End of stirng

See a regex demo.


If the year should only start with 20, you can assert that the first 2 digits of the year should not be other digits than 20:

(?i)^([a-z0-9]+(?>-[a-z0-9]+)*?)(?:-([0-9]{4})+)?$(?<![0-9]{2}(?<!20)..)

Regex demo