I am currently trying to create a perl script that uses LibXML to process data in an SVG font.
In an SVG font, each character is defined as a glyph element with an unicode attribute that defines its unicode address in the form of a unicode entity; like so:
<glyph unicode=" " />
Part of want I want to do is take the value of each glyph element's unicode attribute and process it like a string. However, when I use Element->getAttribute('unicode'); against a glyph node, it returns a "wide character" that displays as a placeholder rectangle, leading me to believe that it expands the unicode entity into a unicode character and returns that.
When I create my parser, I set expand_entities to 0, so I am not sure what else I could do to prevent this. I am rather new with XML processing, so I'm not sure I actually understand what's going on or if this is even supposed to be preventable.
Here is a code sample:
use utf8;
use open ':std', ':encoding(UTF-8)';
use strict;
use warnings;
use XML::LibXML;
$XML::LibXML::skipXMLDeclaration = 1;
my $xmlFile = $ARGV[0];
my $parser = XML::LibXML->new();
$parser->load_ext_dtd(0);
$parser->validation(0);
$parser->no_network(1);
$parser->recover(1);
$parser->expand_entities(0);
my $xmlDom = $parser->load_xml(location => $xmlFile);
my $xmlDomSvg = XML::LibXML::XPathContext->new();
$xmlDomSvg->registerNs('svg', 'http://www.w3.org/2000/svg');
foreach my $myGlyph ($xmlDomSvg->findnodes('/svg:svg/svg:defs/svg:font/svg:glyph', $xmlDom))
{
my $myGlyphCode = $myGlyph->getAttribute('unicode');
print $myGlyphCode . "\n";
}
Note: If I run print $myGlyph->toString();, the unicode entity in the output is not expanded, hence why I'm concluding that the expansion is happening in the getAttribute method.
This might not be the answer you are looking for, but IMHO
getAttributegives you enough information, i.e. a Perl string, to solve your issue in another way. You are trying to write that Perl string to a non-UTF8 file, that's why you get the "wide character" warning.A stripped-down example of how to get the
U+xxxxvalue you are looking for:Test run:
UPDATE: The documentation for
expand_entitiesis IMHO misleading. It talks about "entities", but it obviously meansENTITYdefinitions, i.e. new entities introduced in the document. The libxml2 documentation is unfortunately not much clearer. But this old message seems to indicate that the behavior you describe is expected, ie. a XML parser should always replace pre-defined entities:Test run: