I need to login to a site at one url (ex: 'www.targetsite.com/login') and then navigate to another site to scrape data (ex: 'www.targetsite.com/data'). This is because the site auto directs you to the home page after you login, no matter which url you used to access the site to begin with.
I'm using the mechanize python library (old I know, but it has some functions I'll need later on & is a good learning experience).
The problem I'm facing is that the cookiejar doesn't seem to be working the way I thought it would
import mechanize
import Cookie
import cookielib
cj = cookielib.LWPCookieJar()
br = mechanize.Browser()
br.set_cookiejar(cj)
###browser emulation
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
###login
login_url = "https://targetsite.org/login"
br.open(login_url)
br.select_form(action="https://targetsite.org/wp-login.php?wpe-login=true")
br.form['log'] = 'login'
br.form['pwd'] = 'password
br.submit()
target_url = "https://targetsite.com/data"
br.open(target_url)
soup = BeautifulSoup(br.response().read())
body_tag = soup.body
all_paragraphs = soup.find_all('p')
print(body_tag.text)
Wierdly, the site doesn't seem to be registering my logged in state and is redirecting my mechanise br back to the login screen. Any idea of what's going on?