Dynamic 'wait' arg in Scrapy-splash

711 Views Asked by At

I am scraping multiple pages using Scrapy-Splash.

class Spider(scrapy.Spider):
    name = "scrape"

    def start_requests(self):
        urls = get_urls()
        for url in urls:
            yield scrapy.Request(url, self.parse, meta={
                'splash': {
                    'endpoint': 'render.html',
                    'args': {'wait': 8 }

The code works fine, I get the desired result from the pages.

The problem is, I have to set a larger wait time (>4) or Splash is sometimes terminated by the next request before returning a result. This seems terribly unreliable.

Is there a way to set the wait time to something more dynamic? I found a partial solution here using a LUA script:

Adding a wait-for-element while performing a SplashRequest in python Scrapy

function main(splash)

  -- requires Splash 2.3  
  while not splash:select('.my-element') do
  return {html=splash:html()}

But it appears to require a hard-coded element to terminate Splash (".my-element"), and I am scraping many different websites with different elements to be collected.

How can I dynamically code the 'wait' arg or customise the LUA script to terminate Splash when it has collected the desired element? Surely this is a common problem?


There are 0 best solutions below