Python interpreter gives the following when generating an ISO-8601 formatted date/time string:
>>> import datetime
>>> datetime.datetime.now().isoformat(timespec='seconds')
'2023-10-12T22:35:02'
Note that the '-' character in the string is a hypen-minus character. When going backwards to produce the datetime object, we do the following:
>>> datetime.datetime.strptime('2023-10-12T22:35:02', '%Y-%m-%dT%H:%M:%S')
datetime.datetime(2023, 10, 12, 22, 35, 2)
This all checks out.
However, sometimes when the ISO-8601 formatted date/time string is provided from an external source, such as a parameter sent over in a GET/POST request, or in a .csv file, the hyphens are sent as the ‐ (U+2010) character, which causes the parsing to break:
>>> datetime.datetime.strptime('2023‐10‐12T22:35:02', '%Y-%m-%dT%H:%M:%S')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/_strptime.py", line 568, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/_strptime.py", line 349, in _strptime
raise ValueError("time data %r does not match format %r" %
ValueError: time data '2023‐10‐12T22:35:02' does not match format '%Y-%m-%dT%H:%M:%S'
What is the correct standard? Is it hypen-minus - U+002D as given by Python when converting via .isoformat(), or hypen ‐ U+2010?
Would it be best practice to accept both?
I would recommend ASCII 0x2D because ASCII is very commonly used, and will break less. For your purposes, if you care about compatibility,
.replace("\u2010", "-")to replace it to ASCII,replace("-", "\u2010")for ISO 8601. If you don't care just let your users do it (I recommend ASCII)