I need to periodically login and scrape some data from a particular site. I wrote a CasperJS script to run on Heroku in order to take care of it.
Here is what I want to be able to do:
app.get('/test', function(request, response) {
scrapeStuff(function(data) {
response.send(data);
});
});
Then, at the final step of the spooky script:
spooky.then(function() {
callback(this.getHTML());
});
Unfortunately it doesn't seem to be possible for some reason as the function passed to scrapeStuff
doesn't make it inside the .then()
. (can't find variable: callback) Instead I have to use this.emit()
and monitor it with spooky.on
- you can see an example of how this is done here.
The problem with using emit is that I want to receive the HTML of the scraped page upon request. So I want to access /scrape
, then wait 10 seconds while it's working and receive the page, not call it, assume it succeeded and request another URL to finally get the HTML.
Can this be done with SpookyJS? Maybe there is a better way using CasperJS directly.
They are 3 levels of context using SpookyJS: node (spooky), casper, and the webpage itself.
You can pass data between the 3 contexts, but it will be serialized and unserialized so you are limited to pure JSON object.
See https://github.com/SpookyJS/SpookyJS/wiki/Introduction for a detailed introduction on how it works.