Trying to detect expired short urls, trouble with status_code and response url

46 Views Asked by At

I'm checking short urls that I find in the content of my client. So far, I used a simple requests.get(url) and then process the response url and status code. So far, that got me enough information.

Now, I've encountered expired short urls in the content. When I open the short url manually in the browser, I get

https://short.ly/?ref=expired&url=https://short.ly/abcdef

Parsing parameters in the response url would make things very simple to code but the result I get using the requests library doesn't look like that. Instead, it returns the same url and status code 200, which is the same as any normal page.

Is there a way to get the url I get in the browser with the requests library or must I use a library like Selenium? In my overall process, using Selenium seems like an overkill at this point.

1

There are 1 best solutions below

0
KJG On

Jeyekomon pointed out that t.ly return status code 302 if the short link exsists than its redirects you to the long link which will return a new status code.

if the link does not found you will get status code 200 and via multiple redirects you will arrive at the main site of t.ly The problem is that requests handle redirects automatically so

 r=requests.get('https://t.ly/4WEYb')
 
 print(r.status_code)

will return the status code for the long link(https://www.google.com/search?q=foo)

But you cant stop this by setting allow_redirects=False

 def isLinkExsists(url):
     r=requests.get(url,allow_redirects=False)
     if r.status_code==302:
         return True
     elif r.status_code==200:
         return False
     else:
         #Handle t.ly server errors
         return False