How to pass a variable to lua script being executed inside scrapy from the command prompt?

1k Views Asked by At

I am trying to pass a variable as user-define argument in scrapy which would be used in the for loop of lua script, my code is as follows:

import scrapy
from scrapy_splash import SplashRequest
from scrapy.selector import Selector


class ProductsSpider(scrapy.Spider):
    name = 'allproducts'

    script = '''
        function main(splash, args)
           assert(splash:go(args.url))
           assert(splash:wait(0.5))
           result = {}
           local upto = tonumber(splash.number)
           for i=1,upto,1
           do
             #something
           end
           return output
        
        end
    '''

    def start_requests(self):
        url='https://medicalsupplies.co.uk'
        yield SplashRequest(url=url, callback=self.parse, endpoint='render.html', args={'wait':0.5})
        yield SplashRequest(url=url, callback=self.parse_other_pages, endpoint='execute',
            args={'wait':0.5, 'lua_source':self.script, 'number':int(self.number)}, dont_filter=True)

    def parse(self, response):
        for tr in response.xpath("//table[@id='date']/tbody/tr"):
            yield{
                    'output' : #something
            }

    def parse_other_pages(self,response):
        for page in response.data:
            sel=Selector(text=page)
            for tr in sel.xpath("//table[@id='date']/tbody/tr"):
                yield{
                     'output' : #something
                   }

So, the issue I am facing is, when I run the for loop of the lua script using an interger i.e. for i=1,5,1 the script works just fine but when I try to give an input to the script from the command prompt using scrapy crawl allproducts -a number=5 -o test.json while using for i=1,{self.number},1 for the for-loop inside the script, my code throws an error and I am not even able to use f-strings on this string, is there a way around on how to pass a variable to a text string(here called script) without breaking the code? I know I am not using the right syntax but I haven't found any resources for the same, appreciate any suggestions.

The actual warning from the scraper is as follows:

WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 7, 'error': "attempt to index global 'self' (a nil value)", 'type': 'LUA_ERROR', 'message': 'Lua error: [string "..."]:7: attempt to index global \'self\' (a nil value)'}}

Edit 1: from @Alexander's suggestions, modified the lua script and passed the variable as an integer argument to the SplashRequest, also instanciated the variable in the lua script using local (local upto = tonumber(splash.number))

The warning right now is as follows:

 WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 9, 'error': "'for' limit must be a number", 'type': 'LUA_ERROR', 'message WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 9, 'error': "'for' limit must be a number", 'type': 'LUA_ERROR', 'message': 'Lua error: [string "..."]:9: \'for\' limit must be a number'}}
1

There are 1 best solutions below

7
On BEST ANSWER

function main(splash, args) has no self argument. Yet the line 5 refers to it: for i=1,{self.number},1. And the function is not a method (field of a Lua table of function type) declared with :, where self is that table.

Did you mean splash?

I think, you should add 'number':self.number to args in your Python code (start_requests), and ther refer to it as tonumber(args.number) from your Lua script.