I am fetching HTML from a smarty template and need to clean it (simply want to remove extra whitespace, and format / indent the HTML nicely), I'm using tidy to do something like:
$html = $smarty->fetch('foo.tmpl');
$tidy = new tidy;
$tidy->parseString($html, array(
'hide-comments' => TRUE,
'output-xhtml' => TRUE,
'indent' => TRUE,
'wrap' => 0
));
$tidy->cleanRepair();
return $tidy;
While this works ok for english, multilingual support seems to break this. For example, I have arabic characters ok in $html, but after tidy I get back some nasty encoding:
هل أنت متأكد أنك تريد
Is there a setting in tidy that will format the HTML, but leave the HTML itself alone? I looked at this post: PHP "pretty print" HTML (not Tidy) but it's seems like this won't work since I'm grabbing my HTML from smarty.
Any suggestions appreciated.
Try using the second argument to set the encoding in parseString
http://www.php.net/manual/en/tidy.parsestring.php