I have a PHP website on an apache webserver. That site has been wotrking for years.
A few weeks ago I had to reinstall that machine, so I did and check all the backups, reinstalled the OS (Gentoo) and in the new machine, with the same version of apache and php, restored the website.
I did a quick check that the page loaded and not much more, everything seemed fine.
Today I had to start working again with that site and when I checked the output code in a browser I found this:
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii" />
The thing is, that Apache is set to send things as utf-8, php has default charset as utf-8, the files are all saved as utf-8 and the PHP code generates a Content-Type as utf-8. Where is the us-ascii value coming from?
In the PHP that generates the code, everything is right
ob_start();
// check the output
var_dump( $tmp );exit( __FILE__.' '.__LINE__ );
// send the clean HTML document or the raw XML if something went wrong
if ( $tmp['final_document'] !== false ) {
echo( $tmp['final_document'] );
} else {
echo( $tmp['xml_content'] );
}
ob_end_flush();
The var_dump above, outputs the XHTML source with the right Content-type
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
But if I send the page normally, the encoding is ascii.
The only change that I have been doing, as far as I remember, is setting mod_security, nothing else.
What should I check? Which files? what configuration?
Note:
I'm including only the Apache tag because I think that the problem is related to that, the var_dump shows that PHP is generating the code right. If the problem is something different, I'll adjust the tags accordingly.
I updated my tags to reflect the situation and resolution.
Update - solution:
As suggested, I removed the solution from this part and added it as an answer.
The problem was caused by one configuration line in the website, in the configuration section, I had this:
But I haven't installed tidy this time because I decided, time ago to stop using it and do my own beautification. Before the backup/restore process, I stopped using tidy, but it was still installed on my system and I obviously forgot to delete that configuration line, so the site still was using it, at least with all the default values since my personal configuration for tidy was removed; it was on a separate file.
This error tells me that when you are using tidy, at least with defaults, the final output goes from PHP parser to tidy, from there to Apache and from there to the user. I use to use tidy in a very specific section of my internal process, so I didn't know/remember how/when PHP sent information to it by default. Although makes sense that is sent at the end since I'm using output buffering.
We can see in the HTML Tidy Configuration Options that the default charset is ascii.