The code as listed here will give partial output and then an error (self.htmlparser.parseChunk). When using async.series instead of async.parallel, this example works as expected
the ping webservice will wait 2 seconds and then output "pong", in order to mock a webservice call
app.coffee
async = require 'async'
start = (new Date()).getTime()
require('node.io').scrape () ->
@ping = (callback, n) =>
@getHtml 'http://localhost:8888/ping', (err, $, data) =>
diff = (new Date()).getTime() - start
console.log "#{n} : #{diff}"
callback err, data
async.parallel [
(callback) =>
@ping callback, 1
,
(callback) =>
@ping callback, 2
,
(callback) =>
@ping callback, 3
,
], (err,results) =>
@exit err if err?
console.log n for n in results
@emit 'done'
Output with async.series
1 : 2079
2 : 4089
3 : 6093
1
2
3
done
OK: Job complete
Output with async.parallel
3 : 2079
/home/nodeuser/src/nodews/client/node_modules/node.io/lib/node.io/request.js:296
self.htmlparser.parseChunk(chunk);
TypeError: Cannot call method 'parseChunk' of null
Sys Info
nodeuser@ubuntu:~/src/nodews/client$ node -v && coffee -v && npm -v
v0.4.12
CoffeeScript version 1.1.3
1.0.106
nodeuser@ubuntu:~/src/nodews/client$ uname -a
Linux ubuntu 2.6.38-12-generic #51-Ubuntu SMP Wed Sep 28 14:27:32 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
Looking at the node.io source, it seems that
scrape
creates a singleJob
instance, which creates a single htmlparser instance when needed and destroys it when parsing is done (i.e. when all data from a request has been fed in). So you can't parse multiple sources in parallel from a singlescrape
. Instead, use node.io's lower-level API methods (i.e.new nodeio.Job
); see this wiki page.