Adding a class to all English text in HTML?

57 Views Asked by At

The requirement is to add an englishText class around all english words on a page. The problem is similar to this, but the Javascript solutions wont work for me. I require a PHP example to solve this problem. For example, if you have this:

<p>Hello, 你好</p>
<div>It is me, 你好</div>
<strong>你好, how are you</strong>

Afterwards I need to end with:

<p><span class="englishText">Hello</span>, 你好</p>
<div><span class="englishText">It is me</span>, 你好</div>
<strong>你好, <span class="englishText">how are you</span></strong>

There are more complicated cases, such as:

<strong>你好, TEXT?</strong>
<div>It is me, 你好</div>

This should become:

<strong>你好, <span class="englishText">TEXT?</span></strong>
<div><span class="englishText">It is me</span>, 你好</div>

But I think I can sort out these edge cases once I know how actually iterate over the document correctly.

I can't use javascript to solve this because:

  1. This needs to work on browsers that don't support javascript
  2. I would prefer to have the classes in place on page load so there isn't any delay in rendering the text in the correct font.

I figured the best way to iterate over the document would be using PHP Simple HTML DOM Parser.

But the problem is that if I try this:

foreach ($html->find('div') as $element)
{
    // make changes here
}

My concern is that the following case will cause chaos:

<div>
       Hello , 你好
       <div>Hello, 你好</div>
</div>

As you can see, it's going to go into the first div and then if I process that node, I will be processing the node within that too.

Any ideas how to get around this and only select the nodes for processing once?

UPDATE

I realise now that what I effectively need is a recursive way to iterate over HTML elements with the ability to change them as I iterate over them.

1

There are 1 best solutions below

0
On

You should travel through siblings that way you won't get in trouble with such a cases...

Something like that:

<?php

foreach ($html->find('div') as $element)
{
   foreach($element->next_sibling() as $sibling){
      echo $sibling->plaintext()."\n";
   }
}

?>

Or much easier way imo:

Just...

  1. Change every <*> to "\n"."<*>" with preg_replace();
  2. Make an array of lines like $lines = explode("\n",$html_string);

3.

   foreach($lines as $line){
      $text = strip_tags($line);
      echo $text;    
   }