How to get data from the TreelView list

136 Views Asked by At

http://www.vliz.be/vmdcdata/mangroves/aphia.php?p=browser&id=235056&expand=true#ct (That's the information I am trying to scrape)

I wanna to scrape this detailed taxonomic trees so that I can manipulate them anyway I like.

But there are a few problem in geting this tree data.

  1. I can' t fully expand the taxonomic tree . when some expanding ,some collapse as the instruction indicated . so saving the full page as html files can not sove my problem. or I can repeat the process some times to get separate files and concatenate them.. but it seems to be a ugly way.

  2. I am tired of clicking , there are so many "plus" signs and I have to wait.

Is there a way to solve this out using Python ?

1

There are 1 best solutions below

1
On BEST ANSWER

Use Selenium, this will expand the tree by clicking on the "plus signs" and get the entire DOM with all the elements in it after it's done:

from selenium import webdriver
import time

browser=webdriver.Chrome()
browser.get('http://www.vliz.be/vmdcdata/mangroves/aphia.php?p=browser&id=235301&expand=true#ct')

while True:
      try:
          elem=browser.find_elements_by_xpath('.//*[@src="http://www.marinespecies.org/images/aphia/pnode.gif" or @src="http://www.marinespecies.org/images/aphia/plastnode.gif"]')[1]
          elem.click()
          time.sleep(2)
      except:
          break

content=browser.page_source