I have the following Objective-C code:
[@"http://www.google.com" stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLPathAllowedCharacterSet]];
// http%3A//www.google.com
And yet, in Swift:
"http://www.google.com".addingPercentEncoding(withAllowedCharacters: .urlPathAllowed)
// http://www.google.com
To what can I attribute this discrepancy?
..and for extra credit, can I rely on this code to encode for url path reserved characters while passing a full url like this?
The issue actually rests in the difference between
NSStringmethodstringByAddingPercentEncodingWithAllowedCharactersandStringmethodaddingPercentEncoding(withAllowedCharacters:). And this behavior has been changing from version to version. (It looks like the latest beta of iOS 11 now restores this behavior we used to see.)I believe the root of the issue rests in the particulars of how paths are percent encoded. Section 3.3 of RFC 3986 says that colons are permitted in paths except in the first segment of a relative path.
The
NSStringmethod captures this notion, e.g. imagine a path whose first directory wasfoo:(with a colon) and a subdirectory ofbar:(also with a colon):That results in:
The
:in the first segment of the page is percent encoded, but the:in subsequent segments are not. This captures the logic of how to handle colons in relative paths per RFC 3986.The
StringmethodaddingPercentEncoding(withAllowedCharacters:), however, does not do this:Yields:
Clearly, the
Stringmethod does not attempt that position-sensitive logic. This implementation is more in keeping with the name of the method (it considers solely what characters are "allowed" with no special logic that tries to guess, based upon where the allowed character appears, whether it's truly allowed or not.)I gather that you are saddled with the code supplied in the question, but we should note that this behavior of percent escaping colons in relative paths, while interesting to explain what you experienced, is not really relevant to your immediate problem. The code you have been provided is simply incorrect. It is attempting to percent encode a URL as if it was just a path. But, it’s not a path; it’s a URL, which is a different thing with its own rules.
The deeper insight in percent encoding URLs is to acknowledge that different components of a URL allow different sets of characters, i.e. they require different percent encoding. That’s why
NSCharacterSethas so many different URL-related character sets.You really should percent encode the individual components, percent encoding each with the character set allowed for that type of component. Only when the individual components are percent encoded should they then be concatenated together to form the whole the URL.
Alternatively,
NSURLComponentsis designed precisely for this purpose, getting you out of the weeds of percent-encoding the individual components yourself. For example:That yields the following, with the
&and the two spaces properly percent escaped within thefoovalue, but it correctly left the&in-betweenfooandqux:It’s worth noting, though, that
NSURLComponentshas a small, yet fairly fundamental flaw: Specifically, if you have query values,NSURLQueryItem, that could have+characters, most web services need that percent escaped, butNSURLComponentswon’t. If your URL has query components and if those query values might include+characters, I’d advise againstNSURLComponentsand would instead advise percent encoding the individual components of a URL yourself.