Sorry to add another "Regex explanation" question to the internet but I must know the reason for this. I have ran this regex through RegexBuddy and Regex101.com with no help.
I came across the following regex ("%4d%[^\\n]"
) while debugging a time parsing function. Every now and then I would receive an 'invalid date' error but only during the months of January and June. I mocked up some code to recreate exactly what was happening but I can't figure out why removing the one slash fixes it.
<?php
$format = '%Y/%b/%d';
$random_date_strings = array(
'2015/Jan/03',
'1985/Feb/13',
'2001/Mar/25',
'1948/Apr/02',
'1948/May/19',
'2020/Jun/22',
'1867/Jul/09',
'1901/Aug/11',
'1945/Sep/21',
'2000/Oct/31',
'2009/Nov/24',
'2015/Dec/02'
);
$year = null;
$rest_of_string = null;
echo 'Bad Regex:';
echo '<br/><br/>';
foreach ($random_date_strings as $date_string) {
sscanf($date_string, "%4d%[^\\n]", $year, $rest_of_string);
print_data($date_string, $year, $rest_of_string);
}
echo 'Good Regex:';
echo '<br/><br/>';
foreach ($random_date_strings as $date_string) {
sscanf($date_string, "%4d%[^\n]", $year, $rest_of_string);
print_data($date_string, $year, $rest_of_string);
}
function print_data($d, $y, $r) {
echo 'Date string: ' . $d;
echo '<br/>';
echo 'Year: ' . $y;
echo '<br/>';
echo 'Rest of string: ' . $r;
echo '<br/>';
}
?>
Feel free to run this locally but the only two outputs I'm concerned about are the months of June and January. "%4d%[^\\n]"
will truncate $rest_of_string
to /Ju
and /Ja
while "%4d%[^\n]"
displays the rest of the string as expected (/Jan/03
& /Jun/22
).
Here's my interpretation of the faulty regex:
%4d%
- Get four digits.[^\\n]
- Look for those digits in between the beginning of the string and a new line.
Can anyone please correct my explanation and/or tell me why removing the slash gives me the result I expect?
I don't care for the HOW...I need the WHY.
Like @LucasTrzesniewski pointed out, that's
sscanf()
syntax, it has nothing to do with Regex. The format is explained in thesprintf()
page.In your pattern
"%4d%[^\\n]"
, the two\\
translate to a single backslash character. So the correct interpretation of the "faulty" pattern is:%4d
- Get four digits.%[^\\n]
- Look for all characters that are not a backslash or the letter "n"That's why it matches everything up until the "n" in "Jan" and "Jun".
The correct pattern is
"%4d%[^\n]"
, where the \n translates to a new line character, and it's interpretation is:%4d
- Get four digits.%[^\n]
- Look for all characters that are not a new line