How to scrape incapsula protected website?

1.5k Views Asked by At

https://www.genecards.org/cgi-bin/carddisp.pl?gene=ZSCAN22

On the above webpage, if I click See all 33, I will see the following GET request is sent in Chrome DevTools.

https://www.genecards.org/gene/api/data/Enhancers?geneSymbol=ZSCAN22

Direct accessing of it is blocked.

I have try to use a puppeteer. I can click "See all 33" with puppeteer, but then I need to parse the resulted HTML file. It would be best to directly get the results from https://www.genecards.org/gene/api/data/Enhancers?geneSymbol=ZSCAN22. I am not sure how to get it after clicking "See all 33" with puppeteer.

I am not sure if apify can help.

Can anybody let me know how to scrape it?

1

There are 1 best solutions below

0
mmblack On

I used selenium it working fine

from selenium import webdriver
browser = webdriver.Chrome(executable_path="C:/src/webdriver/chromedriver.exe")
genesLocations = 'https://www.genecards.org/cgi-bin/carddisp.pl?gene={}'

Extract Genomic Locations

gene='ZSCAN22'
browser.get(genesLocations.format(gene))
location = browser.find_element_by_xpath('//*[@id="genomic_location"]/div/div[3]/div/div')
print(location.text)