Regular expression for wraping all tr contains th tags in thead

472 Views Asked by At

I have a problem with regex, I need to wrap all the tr which contains th and put it in a thead. I have a variable $html which contains a html table like this:

$html ="
<table>
<tr>
  <th>header1</th> 
  <th>header2</th>
  <th>header3</th>
</tr>
<tr>
  <th>header21</th> 
  <th>header22</th>
  <th>header23</th>
</tr>

<tr>
  <td>body1</td> 
  <td>body2</td>
  <td>body3</td>
</tr>
<tr>
  <td>body21</td> 
  <td>body22</td>
  <td>body23</td>
</tr>
</table>";

The regex i wrote is this

$html = preg_replace_callback(
'#(<tr.*?<th>.*?<th>.*?<\/tr>)#s', 
 function($match) {
        return '<thead>' . $match[0] . '</thead>';
    },
 $html);

But the result I get is different for what I want. Now, I get tr into a different thead.

2

There are 2 best solutions below

1
On BEST ANSWER

It's not a good idea to try to parse HTML with regular expressions.

That said, you need to get rid of one question mark, which gives you unlimited but as few as possible. For the space between the first and last <th> you want it to be as many as possible. This will to the trick:

              #this is supposed to be as greedy as possible
              #
~(<tr.*?<th>.*<th>.*?</tr>)~s

See https://regex101.com/r/fR1xB5/1

0
On

If have two table in page ,better try below one.

   (<tr>\s*(<th>((?!<tr>).)*</th>)+\s*</tr>)

example:https://regex101.com/r/fR1xB5/2