I'm trying to figure the best way of encoding text (either 8-bit ubyte[]
or string
) to its HTML counterpart.
My proposal so far is to use a lookup-table to map the 8-bit characters
string[256] lutLatin1ToHTML;
lutLatin1ToXML[0x22] = """;
lutLatin1ToXML[0x26] = "&";
...
in HTML that have special meaning using the function
pure string toHTML(in string src,
ref in string[256] lut) {
return src.map!(a => (lut[a] ? lut[a] : new string(a))).reduce!((a, b) => a ~ b) ;
}
I almost work except for the fact that I don't know how to create a string from a `ubyte? (the no-translation case).
I tried
writeln(new string('a'));
but it prints garbage and I don't know why.
For more details on HTML encoding see https://en.wikipedia.org/wiki/Character_entity_reference
You can make a string from a ubyte most easily by doing "" ~ b, for example:
BTW, if you want to do a lot of html stuff, my dom.d and characterencodings.d might be useful: https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff
It has a html parser, dom manipulation functions similar to javascript (e.g. ele.querySelector(), getElementById, ele.innerHTML, ele.innerText, etc.), conversion from a few different character encodings, including latin1, and outputs ascii safe html with all special and unicode characters properly encoded.
stuff like that.