I have a HTML file with a few non-ASCII characters, say encoded in UTF-8 or UTF-16. To save the file in ASCII, I would like to replace them with their (SGML/HTML/XML) entity codes. So for example, every ë should become ë and every ◊ should become ◊. How do I do that?
I use Emacs as an editor. I'm sure it has a function to do the replace, but I cannot find it. What am I missing? Or how do I implement it myself?
There is a character class which includes exactly the ASCII character set. You can use a regexp that matches its complement to find occurrences of non-ASCII characters, and then replace them with their codes using elisp:
So when, for example,
áis matched:\&is"á",string-to-charconverts it to?á(= the number 225), andnumber-to-stringconverts that to"225". Then,concatconcatenates"&#","225"and";"to get"á", which replaces the original match.Surround these commands with
C-x (andC-x ), and applyC-x C-k nandM-x insert-kbd-macroas usual to make a function out of them.To see the elisp equivalent of calling this function interactively, run the command and then press
C-x M-:(Repeat complex command).A simpler version, which doesn't take into account the active region, could be:
(This uses the recommended way to do search + replace programmatically.)