How to remove every text from a website with Javascript

2.3k Views Asked by At

I want to have a Javascript function that removes every text from a website. The background is that in order to compare the appearance of the rendered DOM in difference browsers, I need to eliminate obvious differences before. As font rendering is a known difference, I want to remove every text. The solutions I found were always like this:

if(start.nodeType === Node.TEXT_NODE) 
{
    start.parentNode.removeChild(start);
}

But this only removes pure text nodes. I also want to find constructs like:

 <div>
        <p>
             <em>28.11.2014</em>
             <img></img>
                Testtext
             <span>
                <i>Testtext</i>
                Testtext
             </span>
        </p>
  </div>

Where the element containing text also contains children like or . That way, the element is not recognized as a text node.

So I basically want to turn the above DOM into this:

 <div>
        <p>
             <em></em>
             <img></img>
             <span>
                <i></i>
             </span>
        </p>
  </div>
3

There are 3 best solutions below

0
On BEST ANSWER

You can try something like this.
Demo

HTML:

<div id="startFrom">
    <p>
        <em>28.11.2014</em>
            <img></img>
            Testtext
        <span>
            <i>Testtext</i>
            Testtext
        </span>
    </p>
</div>  

JavaScript:

var startFrom = document.getElementById("startFrom");

function traverseDom(node) {
    node = node.firstChild;
    while (node) {
        if (node.nodeType === 3) {
            node.data = "";
        }
        traverseDom(node);
        node = node.nextSibling;
    }
}

traverseDom(startFrom);
console.log(startFrom);
1
On

With Jquery.. DEMO

$('selecter').find("*").contents().filter(function() {
    return this.nodeType == 3;
}).remove();
3
On

This code below is roughly checked, but you can try to put it in an external .js file and execute it from your document at onload

function cleantxt()
{
    var htmlsrc = document.documentElement.outerHTML;
    var htmlnew = '';
    var istag = false;
    for(i=0; i<htmlsrc.length; i++) {
        if(htmlsrc.charAt(i)=='<') {
            istag = true;
            htmlnew = htmlnew + htmlsrc.charAt(i);
        }
        else if(htmlsrc.charAt(i)=='>') {
            istag = false;
            htmlnew = htmlnew + htmlsrc.charAt(i);
        }
        else if(istag) {
            htmlnew = htmlnew + htmlsrc.charAt(i);
        }
    }
    document.getElementsByTagName("html")[0].innerHTML = htmlnew + 'Cleaned'; // just a signature to see it works 
}