How to login by oauth to third party app with python

584 Views Asked by At

I am having trouble authenticating against a web service that has Oauth provided by google. Basically, I want to login with my google account to a web page to do some scraping on it.

As the web service is not mine, I don't have the app secret_key, only the clientID, redirect_URL and scope that I could recover from seeing the parameters of request method used while being logged in.

Once authenticated, the web page only requieres a cookie named SID (Session ID I would guess) to answer back as an authenticated user. There is no Bearer token, just the SID cookie.

Is it possible to automate this type of authentication? I've read many topics related but they all need the secret_key which I don't have because I'm not the owner of the app.

1

There are 1 best solutions below

2
On

(Cannot comment due to rep)

Yes, what you're asking is possible. You could theoretically follow and match all the requests to authenticate yourself successfully to get the SID and perform scraping, albeit this would be a very difficult task for some basic web-scraping, it's like programming a full-blown scientific calculator to do 5 + 5. What you are asking is a really difficult task, you're going to run into all sorts of security issues and be asked for phone/authenticator app/email verification when attempting to login to your account with Python requests and then you'd need to keep track of those security cookies and keeping them updated, it's a real mess and would be extremely difficult for anyone.

I think the better method would be to manually authenticate yourself and get the SID cookie and hard-code that into your scraper within the cookie HTTP header.

I understand this brings up the concern of what to do when the SID cookie expires. Since you haven't said the site, It would be hard for me to imagine a site that makes you authenticate yourself with Google often rather than having their own internal SID/JWT refreshing system to keep you logged in.

My recommendations would be:

  • Check the expiration of the SID cookie, if it's viable to manually copy-and-paste it after authenticating yourself, do that.
  • If the SIDs expire soon, check if there's an API request anywhere to get yourself a new SID (Without going through the OAuth again), in your Network panel look for the set-cookie response header setting a new SID, you might need to change and keep track of these inside your program but it'll be much easier than writing a program to login to Google.
  • If there's no way to refresh the SID and they expire often and you need to do long-term web scraping and sitting there getting a new cookie manually every 30 minutes isn't enough, I'd recommend looking into doing this with Puppeteer/Chromium as it'll be much easier than doing it via Python HTTP requests.