First of I'd like to mention I'm still confused about how exactly website requests work, I know the basics like GET and POST requests, but not really how everything interacts with each other. I am trying to write a program for my younger brother to get notifications when a class starts, or if a class is cancelled. To achieve this I am trying to get the required data from the school website using python. It's illegal for the public to use their API, which makes things a lot harder. Without really knowing what to do I started just trying things. The website requires the student to login and thus, I tried to use GET and POST requests with mechanicalsoup, but that didn't work because it doesn't support by website required javascript. Afterwards I tried to use selenium, but that didn't work either because it can't find the form to input the user data.
I have been able to scrape data from other websites, (like reddit, for example) but this one just doesn't want to work. For info, the website is this link. It's been 3 days just trying various things now so I'm lost xd.
I'd really appreciate it if someone could help me out on this one
EDIT: Selenium Script used
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
driver = webdriver.Firefox()
driver.get("https://ozhw.magister.net")
driver.find_element(By.ID, "username").send_keys("private")
As Tim Roberts pointed out, the below method does not work for this use case. However, this strategy is good to know for other login problems you may encounter.
Selenium is overkill for this. You just need to figure out the steps the browser is taking to get its future requests authorized.Discovery
We need to find the
Networkthat contains the username. This should provide the form, endpoint, and other data that may be relevant to the request.I see that the username submission contains a auth code, which may have its own process that you have to go through. With the site open refresh the page and you should see all of the requests sent from the browser to get that code (assuming that code is acquired separately).
Python
As you gain an understanding of what endpoints are doing the authentication, what data they require, and what they return, you can put each step into Python. A client/session will be important. I recommend
httpxorrequeststo do this.Here is a skeleton of what you could fill out (may be incomplete, adjust as you must).