Python BeautifulSoup how to extract var result from javascript element?

233 Views Asked by At

I'm new with Python and I've been trying to use BeautifulSoup to extract one particular data line from a variable defined in a script element.

Code:

import requests
from bs4 import BeautifulSoup
import esprima

#----------------some comment'

URL = 'https://downdetector.com/status/facebook/'

browser = {'user-agent': 'my agent'}


#--------------some comment:
page = requests.get(URL, headers=browser)
soup = BeautifulSoup(page.content, 'html.parser')


#---------------some comment:

chart = soup.find("div",{"class":"popover-container justify-content-center p-relative"}).script.get_text()
print(chart)

OUTPUT:

var data = {
status: 'success',  
baseline: 29,       
communicate: null,  
company: 'Facebook',
max: 66,
series: [

                      { x: '2020-05-30T13:22:28.168484-04:00', y: 25  },

                      { x: '2020-05-30T13:37:28.168484-04:00', y: 27  },

                      .....

                      { x: '2020-05-31T13:07:28.168484-04:00', y: 30  },

                  ]
                }

                $(function () {
                  chartThis(data, 'holder', 'line')
                });

                if (data.communicate && $('#dd-communicate').length) {
                  $('#dd-communicate').html('<div class="border text-left d-inline-block p-2"><i class="fa" aria-hidden="true" style="color: red; width:16px; height:12px; background:url(https://cdn2.downdetector.com/d328eb8cbe4e164/images/v2/message.svg) no-repeat"></i>'
                    +'<span class="d-inline-block px-1">'+ data.company+' &bull;  ' + moment.utc(data.communicate.created_at).fromNow()
                    + '</span><p class="font-weight-bold my-0">'+ data.communicate.message + '</p></div>')
                }

Do you know an easy way to extract the 'max' value from the var result above?

I've tried using esprima, but still no luck as I've hit error:

Traceback (most recent call last): File "c:/test.py, line 31, in if token["type"] == "Identifier" and token["value"] == "max": TypeError: 'BufferEntry' object is not subscriptable

My code with esprima looked like this:

import requests
from bs4 import BeautifulSoup
import esprima

#----------------some comment'

URL = 'https://downdetector.com/status/facebook/'

browser = {'user-agent': 'my agent'}


#--------------some comment:
page = requests.get(URL, headers=browser)
soup = BeautifulSoup(page.content, 'html.parser')


#---------------some comment:

chart = soup.find("div",{"class":"popover-container justify-content-center p-relative"}).script.get_text()

tokens = esprima.tokenize(chart)

token_iterator = iter(tokens)

for token in token_iterator:
    if token["type"] == "Identifier" and token["value"] == "max":
        value_token = next(next(token_iterator))
        result = value_token["value"]

Any help would be greatly appreciated!

1

There are 1 best solutions below

6
On BEST ANSWER

A quick solution to extract the max value would be to use split on the chart:

import requests
from bs4 import BeautifulSoup

URL = 'https://downdetector.com/status/facebook/'
browser = {'user-agent': 'my agent'}

page = requests.get(URL, headers=browser)
soup = BeautifulSoup(page.content, 'html.parser')


chart = soup.find("div",{"class":"popover-container justify-content-center p-relative"}).script.get_text()
max_val= chart.split("max: ")[1].split(",")[0]

print(max_val)

OUT: 64