Javascript split body of html with REGEX

507 Views Asked by At

Im looking to split a body of html into an array.

Here is an example of what the code looks like:

<p><h2 class="title">Title 1</h2></p>
<p>Section 1: Lorem ipsum dolor sit amet, consectetur adipisicing elit.</p>
<p>velit saepe ducimus aspernatur, quam quaerat autem. Consectetur, vitae.</p>
<p><h2 class="title">Title 2</h2></p>
<p>Section 2: Lorem ipsum dolor sit amet, consectetur adipisicing elit.</p>
<p><h2 class="title">Title 3</h2></p>
<p>Section 3: Lorem ipsum dolor sit amet, consectetur adipisicing elit.</p>

Basically I'd like to split the sections up using a positive lookbehind with the following pattern <p><h2 class="title">*</h2></p> or any other type of regex pattern.

Essentionally I'm looking to have an array that contains something like so...

<p><h2 class="title">Title 1</h2></p>
<p>Section 1: Lorem ipsum dolor sit amet, consectetur adipisicing elit.</p>
<p>velit saepe ducimus aspernatur, quam quaerat autem. Consectetur, vitae.</p>

<p><h2 class="title">Title 2</h2></p>
<p>Section 2: Lorem ipsum dolor sit amet, consectetur adipisicing elit.</p>

<p><h2 class="title">Title 3</h2></p>
<p>Section 3: Lorem ipsum dolor sit amet, consectetur adipisicing elit.</p>

This is the code that will alway be constant <p><h2 class="title">*</h2></p>. The content will always be encapsulate within <p> tags.

Here is an example of the script Im parsing the data through...

$(contentArr).each(function(ele, idx){

        var content      = ele, contentTrun;
        var contentRegex = /(<p>.*<\/p>)/im;
        var matchContent = contentRegex.exec(content);

        //parse block to get it ready for styling and effect
        var contentRegex    = /((?!<p><h2 class="title".*?\n)<p>.*<\/p>)/igm;
        var parsedContent   = content.replace(contentRegex, "$1");

        //insert parsed content into html block
        $("pressBlocks").insert("<div class=\"blockContentOutter\">\
                                    <span class=\"accordionText\">... <a class=\"readMore\">Read More</a></span>\
                                        <div class=\"blockContent\">"+parsedContent+"</div>\
                                </div>");
    });
2

There are 2 best solutions below

0
On

Well, ..if you really need a splitter and you know that the input format remains unchanged - just split it with something like this:

var splitter = "<p><h2 class=\"title\">";
output = inputHTML.split(splitter);
for(var i=1; i<output.length){
    output[i] = splitter + output[i];
}

but really - there're better ways to do it nice :)

eg. with jQuery:

var output = [];
var $input = $('<div/>').append(inputHTML);
$input.children().each( function(){
    var $this = $(this);
    if($this.find('h2.title').length || output.length==0){
        output.push( $('<div/>').append($this) );
    } else {
        output[output.length - 1].append($this);
    }
});

this will give you your paragraphs splitted in divs ready in 'output' array - to do whatever you need with them.

I've just noticed - that @MT0 is absolutely right - it is not correct to wrap h2 element inside of paragraphs - so my code will work - but only if you nest your inputHTML correctly - with div's or sections or other block elements instead of paragraphs:

<div><h2 class="title">Title 1</h2></div>
<div>Section 1: Lorem ipsum dolor sit amet, consectetur adipisicing elit.</div>
<div>velit saepe ducimus aspernatur, quam quaerat autem. Consectetur, vitae.</div>
<div><h2 class="title">Title 2</h2></div>
<div>Section 2: Lorem ipsum dolor sit amet, consectetur adipisicing elit.</div>
<div><h2 class="title">Title 3</h2></div>
<div>Section 3: Lorem ipsum dolor sit amet, consectetur adipisicing elit.</div>
0
On

As I noted in the comments, this is invalid syntax for HTML:

<p><h2>...</h2></p>

The h2 tag will implicitly close the p tag and they will not be nested (and you will have an empty paragraph before the first heading).

You can solve your problem without regular expressions (although you will need to fix the HTML you are inputting):

contentArr = [
    "<h2 class=\"title\">Title 1</h2>\
<p>Section 1: Lorem ipsum dolor sit amet, consectetur adipisicing elit.</p>\
<p>velit saepe ducimus aspernatur, quam quaerat autem. Consectetur, vitae.</p>\
<h2 class=\"title\">Title 2</h2>\
<p>Section 2: Lorem ipsum dolor sit amet, consectetur adipisicing elit.</p>\
<h2 class=\"title\">Title 3</h2>\
<p>Section 3: Lorem ipsum dolor sit amet, consectetur adipisicing elit.</p>"
];

$(contentArr).each( function( index, element ){
    $( element ).each( function( i, e ){
        if ( !$( e ).is( "h2" ) )
            return;
        $( '<div class="blockContentOuter" />' )
            .append( '<span class="accordionText">... <a class="readMore">Read More</a></span>' )
            .append( $( '<div class="blockContent" />')
                .append( $(e).nextUntil( "h2" ) ) )
            .appendTo( '#pressBlocks' );
    });
});
.blockContentOuter {
    background-color: lightgrey;
    border: 1px solid darkgrey;
    margin-top: 0.5em;
}

.blockContent {
    background-color: white;
    border: 1px solid darkgrey;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="pressBlocks" />