I am trying to pass a variable as user-define argument in scrapy which would be used in the for loop of lua script, my code is as follows:
import scrapy
from scrapy_splash import SplashRequest
from scrapy.selector import Selector
class ProductsSpider(scrapy.Spider):
name = 'allproducts'
script = '''
function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(0.5))
result = {}
local upto = tonumber(splash.number)
for i=1,upto,1
do
#something
end
return output
end
'''
def start_requests(self):
url='https://medicalsupplies.co.uk'
yield SplashRequest(url=url, callback=self.parse, endpoint='render.html', args={'wait':0.5})
yield SplashRequest(url=url, callback=self.parse_other_pages, endpoint='execute',
args={'wait':0.5, 'lua_source':self.script, 'number':int(self.number)}, dont_filter=True)
def parse(self, response):
for tr in response.xpath("//table[@id='date']/tbody/tr"):
yield{
'output' : #something
}
def parse_other_pages(self,response):
for page in response.data:
sel=Selector(text=page)
for tr in sel.xpath("//table[@id='date']/tbody/tr"):
yield{
'output' : #something
}
So, the issue I am facing is, when I run the for loop of the lua script using an interger i.e. for i=1,5,1
the script works just fine but when I try to give an input to the script from the command prompt using scrapy crawl allproducts -a number=5 -o test.json
while using for i=1,{self.number},1
for the for-loop inside the script, my code throws an error and I am not even able to use f-strings on this string, is there a way around on how to pass a variable to a text string(here called script) without breaking the code? I know I am not using the right syntax but I haven't found any resources for the same, appreciate any suggestions.
The actual warning from the scraper is as follows:
WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 7, 'error': "attempt to index global 'self' (a nil value)", 'type': 'LUA_ERROR', 'message': 'Lua error: [string "..."]:7: attempt to index global \'self\' (a nil value)'}}
Edit 1: from @Alexander's suggestions, modified the lua script and passed the variable as an integer argument to the SplashRequest, also instanciated the variable in the lua script using local (local upto = tonumber(splash.number))
The warning right now is as follows:
WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 9, 'error': "'for' limit must be a number", 'type': 'LUA_ERROR', 'message WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 9, 'error': "'for' limit must be a number", 'type': 'LUA_ERROR', 'message': 'Lua error: [string "..."]:9: \'for\' limit must be a number'}}
function main(splash, args)
has noself
argument. Yet the line 5 refers to it:for i=1,{self.number},1
. And the function is not a method (field of a Lua table of function type) declared with:
, whereself
is that table.Did you mean
splash
?I think, you should add
'number':self.number
toargs
in your Python code (start_requests
), and ther refer to it astonumber(args.number)
from your Lua script.