Extract token from within <script> tags BeautifulSoup4, Requests

278 Views Asked by At

I'm trying to isolate the securityToken from an HTML response. The securityToken is within tags though.

I've been able to isolate the tag with the code below:

import requests
from bs4 import BeautifulSoup
import re

url = 'https://obe.sandals.com/read-land-availability/'
r = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36"})
soup= BeautifulSoup(r.text, 'html.parser')
mytext = soup.find('script', text = re.compile('securityToken:'))

print(mytext)

Here is the output, but I cannot figure out the last step to extract the securityToken

<script> window._app.page = { jsView: './views/step1/Vacation', securityToken: "BF8394B1DD5481AF43BE2AF02243903F121D26327E83ADC13785F6EF739B5870", subSessionId: "6D71C585C7F51CF105B3100A473635ACF3637329F2C1ABAADB1F2827832562D8", step: 1 }; </script>

Process finished with exit code 0
2

There are 2 best solutions below

1
MendelG On BEST ANSWER

To extract the value of securityToken try the following:

import re
import requests
from bs4 import BeautifulSoup


url = 'https://obe.sandals.com/read-land-availability/'
r = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36"})
soup = BeautifulSoup(r.text, 'html.parser')
mytext = soup.find('script', text = re.compile('securityToken:'))


print(re.search(r'securityToken: "(.*?)"', str(mytext)).group(1))

Output:

5EFDCE1D62C5F1C1369EF3629F921B0F90301ACB51C5FD24321D7FB58D04DE50
0
goalie1998 On

If you use 'html5lib' instead of 'html.parser', and the location of the security token is always the same:

mytext.split('securityToken: "')[1].split('", subSessionId:')[0]