I have a URL string as:
url = "https://foo.bar.com/path/to/aaa.bbb/ccc.ddd;dc_trk_aid=486652617;tfua=;gdpr=;gdpr_consent=?&339286293"
when using Python
from urllib.parse import urlparse
url_obj = urlparse(url)
url_obj.path # `path/to/aaa.bbb/ccc.ddd`
when using ruby
url_obj = URI.parse(url)
url_obj.path # `path/to/aaa.bbb/ccc.ddd;dc_trk_aid=486652617;tfua=;gdpr=;gdpr_consent=`
I guess python is consider ; is not part of the url path, which one is 'correct'?
Python's
urllibis wrong. RFC 3986 Uniform Resource Identifier (URI): Generic Syntax, Section 3.3 Path explicitly gives this exact syntax as an example for a valid path [bold emphasis mine]:The correct interpretation of the example URI you posted is the following:
httpsfoo.bar.comfoo.bar.com443/path/to/aaa.bbb/ccc.ddd;dc_trk_aid=486652617;tfua=;gdpr=;gdpr_consent=, consisting of the following four path segments:pathtoaaa.bbbccc.ddd;dc_trk_aid=486652617;tfua=;gdpr=;gdpr_consent=&339286293