Python: URL parsing issue while adding a trailing slash

1.8k Views Asked by hnvasa At 24 December 2014 at 18:09

I was developing a small experiment in python to normalize a URL. My main purpose is to add slash / at the end of the URL if it is not already present. for example if it is http://www.example.com then it should be converted to http://www.example.com/

Here is a small snippet for the same:

if url[len(url)-1] != "/":
        url = url + "/"

But this also converts file names. For example http://www.example.com/image.png into http://www.example.com/image.png/ which is wrong. I just want to add slash to directory and not file names. How do i do this?

Thanks in advance!

Original Q&A

There are 2 best solutions below

THK On 24 December 2014 at 18:43 BEST ANSWER

You could pattern match on the last substring to check for known domains vs file extensions. It's not too difficult to enumerate at least the basic top level domains like .com, .gov, .org, etc.

If you are familiar with regular extensions, you can match on a pattern like '.com$'.

Otherwise, you can split by '.' and check the last substring you get:

In [32]: url_png = 'http://www.example.com/image.png'

In [33]: url_com = 'http://www.example.com'

In [34]: domains = ['com', 'org', 'gov']

In [35]: for url in [url_png, url_com]:
   ....:     suffix = url.split('.')[-1]
   ....:     if suffix in domains:
   ....:         print url
   ....:
http://www.example.com

As a side note and as you see in the above example, you don't need to do url[len(url)-1] to index the last element of a list; the Pythonic way is just url[-1].

hyades On 24 December 2014 at 18:43

You gotta ensure that whenever a . comes in URL, for directory it should be in the hostname. If its anywhere else, it is a file name. So for this, just do url.count('.') and check if that is greater than the ones in your hostname (eg, in here its equal to 2)

if url.count('.') > 2:
    url = url if url[-1] != '/' else url[:-1]
else:
    url = url  if url[-1] == '/' else url + '/'

Python: URL parsing issue while adding a trailing slash

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in URL

Related Questions in URL-PARSING

Trending Questions

Popular # Hahtags

Popular Questions