PHP Regex to remove last paragraph and contents

2.3k Views Asked by At

I have the following stored in a MySQL table:

<p>First paragraph</p><p>Second paragraph</p><p>Third paragraph</p><div class="item"><p>Some paragraph here</p><p><strong><u>Specs</u>:</strong><br /><br /><strong>Weight:</strong> 10kg<br /><br /><strong>LxWxH:</strong> 5mx1mx40cm</p><p>This is the paragraph I am trying to remove with regex.</p></div>

I'm trying to remove the last paragraph tags and content on every row in the table. I can loop through the table with PHP easily enough, but the regex has me stumped.

Every preg_match I've found on stackoverflow either gives me a "preg_match(): Unknown modifier" error, or the var_dump shows an empty array. I believe that would only match the content even if it did work so I think I need preg_replace?

The rows aren't identical in length, but it is always going to be the last paragraph that I want to completely remove.

Would appreciate if someone could show me how. Thanks

2

There are 2 best solutions below

2
On BEST ANSWER

This would remove the last <p>anything</p>.

<?php
$html = '<p>First paragraph</p><p>Second paragraph</p><p>Third paragraph</p><div class="item"><p>Some paragraph here</p><p><strong><u>Specs</u>:</strong><br /><br /><strong>Weight:</strong> 10kg<br /><br /><strong>LxWxH:</strong> 5mx1mx40cm</p><p>This is the paragraph I am trying to remove with regex.</p></div>';
$html = preg_replace('~(.*)<p>.*?</p>~', '$1', $html);
echo $html;

The (.*) is grabbing everything until the last paragraph tag and storing it. The .*? grabs everything between the paragraph tags, the ? tells it to stop at the next closing paragraph tag. We don't use the capturing here because we don't care what is inside. The $1 is the found content before the last <p>. The ~ are delimiters telling where the regex begins and ends. I suspect this is what is causing your regexs to fail currently. http://php.net/manual/en/regexp.reference.delimiters.php

Output:

<p>First paragraph</p><p>Second paragraph</p><p>Third paragraph</p><div class="item"><p>Some paragraph here</p><p><strong><u>Specs</u>:</strong><br /><br /><strong>Weight:</strong> 10kg<br /><br /><strong>LxWxH:</strong> 5mx1mx40cm</p></div>

Note: There are XML/HTML parsers you should consider using them because regexs with HTML/XML can get very messy quickly.

http://php.net/manual/en/refs.xml.php
How do you parse and process HTML/XML in PHP?

Demo: http://sandbox.onlinephpfunctions.com/code/0ddf46c328323e8b6357313a5464733ff797bc3f

0
On

A solution without regexp would be

$string = '<p>First paragraph</p><p>Second paragraph</p><p>Third paragraph</p><div 
class="item"><p>Some paragraph here</p><p><strong><u>Specs</u>:</strong><br /><br /> 
<strong>Weight:</strong> 10kg<br /><br /><strong>LxWxH:</strong> 5mx1mx40cm</p><p>This 
is the paragraph I am trying to remove with regex.</p></div>';

$lastOccurenceOfEnd = strrpos($string,"</p>");
$lastOccurenceOfStart = strrpos($string,"<p>");
$removedParagraph = substr_replace($string, '', $lastOccurenceOfStart, 
$lastOccurenceOfEnd-$lastOccurenceOfStart+4);

echo $removedParagraph;