unoptional protocol in entered link - py regex | requests

47 Views Asked by At

Problem is: check if entered link is valid, optionally that link could be entered both as https://stackoverflow.com/ and stackoverflow.com.

I tried to solve it as

input_url = str(input("Enter url: ")
result = re.findall(r'(http[s]?://)?\S+', input_url)

returns error - Invalid URL '': No schema supplied. Perhaps you meant http://?

no urllib or something else, it has to be only regex

full code:

import re, requests
from collections import Counter
from prettytable import PrettyTable

url_input = str(input("Enter url: "))

url_checked = re.findall(r'(http[s]?://)?\S+', url_input)[0] # берем первый элемент

response = requests.get(str(url_checked)) # запрос на введенную ссылку

result = re.findall( r"\"(?:http[s]?://)?([^:/\s\"]+)/?[^\"]*\"", response.text) # фильтрация ссылок

result.sort() # sorting by alphabet 

# link - https://stackoverflow.com/

pt = PrettyTable(field_names = ["word", "counter"])
pt.add_rows(list(Counter(result).most_common()))
print(pt)
1

There are 1 best solutions below

4
sophros On

Your regular expression seems way too simple to robustly validate URL. I suggest you use the one from here.