Isolating the top-level domain of URL

43 Views Asked by At

I need a string method to isolate the top-level domain of a URL in Python. For example, consider the URL

"https://stackoverflow.com/questions/ask"

I want to extract just the ".com" without anything after or before that.

I tried something naiive:

domain = URL.split(".")
1

There are 1 best solutions below

2
On

Use the standard library's URL parser (urllib.parse) to extract the host name, then split on . to retrieve the TLD:

import urllib.parse

url = "https://stackoverflow.com/questions/ask"
hostname = urllib.parse.urlparse(url).hostname
*_, tld = hostname.split(".")

>>> print(hostname)
'stackoverflow.com'
>>> print(tld)
'com'