I am a beginner, and doing an assignment to scrape the content of this page using node.io
http://www.nycourts.gov/reporter/3dseries/2013/2013_06966.htm.
I want to save the text content which are under < P > tags as a string in a variable.
My code is this:
var nodeio = require('node.io'); var methods = { input: false, run: function() { this.getHtml('http://www.nycourts.gov/reporter/3dseries/2013/2013_06966.htm', function(err, $) {
//Handle any request / parsing errors if (err) this.exit(err); var content = $('P'); this.emit(content); }); } }
exports.job = new nodeio.Job({timeout:10}, methods);
This is showing error: No elements matching 'P'. Please help..
I got
Error: No elements matching 'P'
too when performing command:The root cause is no ending
</P>
in that page and node.io doesn't support auto correction for such malformed HTML like modern web browser. while it works well when querying<blockquote>
:However, you can make it by parsing HTML document over a real browser with selenium technology.
Here's example javascript can run with node and a selenium grid on your host to get what you want. you can refer to my other answer to question How do you get webdriverjs working?: