PhP strip_tags: how do I remove tags that holds a certain attribute

444 Views Asked by At

I would like to strip the tags out of my wordpress feed, but keep the paragraphs.

strip_tags($content, '<p>' );

This works fine but I do not want to keep the paragraphs that deals with image captions... they look like this:

<p class="wp-caption-text">blah blah blah</p>

So, how do I strip tags that hold, let's say, class attributes?

All help much appreciated.

2

There are 2 best solutions below

4
On

Edit: This wasn't actually the solution to what the OP wanted, however answers the question that was asked.


You can't do this with strip_tags directly unfortunately.

You could use DOMDocument, though, then strip_tags after:

$DOM = new DOMDocument();
$DOM->loadHTML($content);
foreach($DOM->getElementsByTagName("p") as $p)
{
    foreach($p->attributes as $attr)
        $p->removeAttributeNode($attr);
}
$content = $DOM->saveHTML();

//Uncommenting this will then remove the tag as well.
//$content = strip_tags($content, 'p');
1
On

The simplest way to do this would be to use a DOM parsing library. DOMDocument is built into PHP and works great for DOM manipulation. DOMXPath is good for querying.

$dom = new DOMDocument;
$dom->loadHTML($yourHTML);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//*[not(p)]|.wp-caption-text") as $node) {
    $node->parentNode->removeChild($node);
}

Note that this also does the work of strip_tags.