Reading embedded HTML in an XML document with QXmlStreamReader

649 Views Asked by At

Using QXmlStreamReader, I would like to have XML files with rich text formatting, using HTML tags. For instance, in this file, it would be nice to have access to <em> and other HTML tags for formatting text. (And with Qt I can put HTML anywhere, in a QLabel or something.)

<?xml version="1.0" encoding="UTF-8"?>
<course name="Introductory Course">
    <course-description>Welcome to the <em>basic course</em>.</course-description>
</course>

If I use QXmlStreamReader::readElementText(QXmlStreamReader::IncludeChildElements) when at the start element of <course-description>, I get the text inside <course-description> stripped of the tags, for example Welcome to the basic course.

Of course, I would like to do this without having to account for every single HTML tag in my code.

1

There are 1 best solutions below

0
On

What I ended up doing is creating a method that I can use in places where I would otherwise call QXmlStreamReader::readElementText. In the XML file, I mark a tag with the XHTML namespace:

<?xml version="1.0" encoding="UTF-8"?>
<course name="Introductory Course">
    <course-description xmlns="http://www.w3.org/1999/xhtml">Welcome to the <em>basic course</em>.</course-description>
</course>

Then whenever I read a tag with QXmlStreamReader, I can call readHtml. If the element has the XHTML namespace, it reads and returns all the elements until it reaches the closing element. (This implies that an element with the same name as the namespace-containing element (<course-description> above), cannot be included in the HTML code.)

QString MyClass::readHtml(QXmlStreamReader &xml)
{
    if( xml.namespaceUri().toString() != "http://www.w3.org/1999/xhtml" )
    {
        return xml.readElementText(QXmlStreamReader::IncludeChildElements);
    }
    QString terminatingElement = xml.name().toString();
    QString html;
    QXmlStreamWriter writer(&html);
    do
    {
          xml.readNext();
          switch( xml.tokenType() )
          {
          case QXmlStreamReader::StartElement:
              writer.writeStartElement(xml.name().toString());
              writer.writeAttributes(xml.attributes());
              break;
          case QXmlStreamReader::EndElement:
              writer.writeEndElement();
              break;
          case QXmlStreamReader::Characters:
              writer.writeCharacters(xml.text().toString());
              break;
          // a more thorough approach would handle these; enumerating them removes a compiler warning
          case QXmlStreamReader::NoToken:
          case QXmlStreamReader::Invalid:
          case QXmlStreamReader::StartDocument:
          case QXmlStreamReader::EndDocument:
          case QXmlStreamReader::Comment:
          case QXmlStreamReader::DTD:
          case QXmlStreamReader::EntityReference:
          case QXmlStreamReader::ProcessingInstruction:
              break;
          }
    }
    while (!xml.atEnd() && xml.name() != terminatingElement );
    return html;
}