Issues with implementing async.parallel in node.io

582 Views Asked by At

The code as listed here will give partial output and then an error (self.htmlparser.parseChunk). When using async.series instead of async.parallel, this example works as expected

the ping webservice will wait 2 seconds and then output "pong", in order to mock a webservice call

app.coffee

async = require 'async'
start = (new Date()).getTime()

require('node.io').scrape () ->
    @ping = (callback, n) =>
            @getHtml 'http://localhost:8888/ping', (err, $, data) => 
                diff = (new Date()).getTime() - start
                console.log "#{n} : #{diff}"
                callback err, data
    async.parallel [
        (callback) =>
            @ping callback, 1
        ,
        (callback) =>
            @ping callback, 2
        ,
        (callback) =>
            @ping callback, 3
        ,
        ], (err,results) =>
            @exit err if err?
            console.log n for n in results
            @emit 'done'

Output with async.series

1 : 2079
2 : 4089
3 : 6093
1
2
3
done
OK: Job complete

Output with async.parallel

3 : 2079
/home/nodeuser/src/nodews/client/node_modules/node.io/lib/node.io/request.js:296
                    self.htmlparser.parseChunk(chunk);
TypeError: Cannot call method 'parseChunk' of null

Sys Info

nodeuser@ubuntu:~/src/nodews/client$ node -v && coffee -v && npm -v
v0.4.12
CoffeeScript version 1.1.3
1.0.106

nodeuser@ubuntu:~/src/nodews/client$ uname -a
Linux ubuntu 2.6.38-12-generic #51-Ubuntu SMP Wed Sep 28 14:27:32 UTC 2011 x86_64 x86_64    x86_64 GNU/Linux
1

There are 1 best solutions below

0
On BEST ANSWER

Looking at the node.io source, it seems that scrape creates a single Job instance, which creates a single htmlparser instance when needed and destroys it when parsing is done (i.e. when all data from a request has been fed in). So you can't parse multiple sources in parallel from a single scrape. Instead, use node.io's lower-level API methods (i.e. new nodeio.Job); see this wiki page.