Removing script from HTML string using node-html-parser

1.4k Views Asked by At

I have an API which receives a string containing HTML code and stores it in a database. I'm using node-html-parser package to perform some logic on the HTML.

Among other things, I want to remove any potentially-malicious script. According to the documentation, the package should be able to do this when instructed via the options object (see 'Global Methods' heading in previous link).

My code:

const parser = require('node-html-parser');
const html = `<p>My text</p><script></script>`
const options = {
    blockTextElements: {
        script: false
    }
}
const root = parser.parse(html, options)
return ({ html: root.innerHTML})

I tried modifying the options object with script: true, noscript: false, and noscript: true as well, but neither removed the script tags from the html.

Am I doing something wrong?

1

There are 1 best solutions below

0
On

Seems like the 'node-html-parser' is kind of buggy for script: false but we still can use this library to work with DOM. My solution is to use querySelectorAll to find all the <script> tags and remove them so the final solution might looks like:

const parser =  require('node-html-parser');
let html = '<html>asdasd<script></script></html>';
//convert plain html to dom
let dom = parser.parse(html);
//select all the script tags from the DOM and remove them
dom.querySelectorAll('script').forEach(x=> x.remove());

//now DOM contains everything except script tags
//to transform DOM back to plain html we just need to use method toString() 
console.log(dom.toString());