parse ACDSee categories

119 Views Asked by At

How could one parse a xml-like string and convert it a separated list?

I am trying to convert the following string:

<Categories>
  <Category Assigned="0">
    6 Level
    <Category Assigned="1">
      6.2 Level
      <Category Assigned="0">
        6.3 Level
        <Category Assigned="0">
          6.4 Level
          <Category Assigned="1">
            6.5 Level
          </Category>
        </Category>
      </Category>
    </Category>
  </Category>
</Categories>

To a separated list like:

6 Level/6.2 Level/6.3 Level/6.4 Level/6.5 Level, 6 Level/6.2 Level

Robin Mills of exiv2 provided a perl script: http://dev.exiv2.org/boards/3/topics/1912?r=1923#message-1923

That would need to also parse Assigned="1". How can this be done in C++ to use in digikam, inside dmetadata.cpp with a structure like:

    QStringList ntp = tagsPath.replaceInStrings("<Category Assigned="0">", "/");

I don't have enough programming background to figure this out, and haven't found any code sample online that do something similar. I'd also like to include the code in exiv2 itself, so that other applications can benefit.

Working code will be included in digikam: https://bugs.kde.org/show_bug.cgi?id=345220

2

There are 2 best solutions below

6
On

The code you have linked makes use of Perl's XML::Parser::Expat module, which is a glue layer on top of James Clark's Expat XML parser.

If you want to follow the same route you should write C++ that uses the same library, but it can be clumsy to use as the API is via callbacks that you specify to be called when certain events in the incoming XML stream occur. You can see them in the Perl code, commented process an start-of-element event etc.

Once you have linked to the library, it should be simple to write C code that is equivalent to the Perl in the callbacks — they are only a single line each. Please open a new question if you are having problems with understanding the Perl

Note also that Expat is a non-validating parser, which will let through malformed data without comment

Given that the biggest task is to parse the XML data in the first place, you may prefer a different solution that allows you to build an in-memory document structure from the XML data, and interrogate it using the Document Object Model (DOM). The libxml library allows you to do that, and has its own Perl glue layer in the XML::LibXML module

0
On

Maik Qualmann has provided a working patch for digikam!

QString xmlACDSee = getXmpTagString("Xmp.acdsee.categories", false);
if (!xmlACDSee.isEmpty())
{
    xmlACDSee.remove("</Categories>");
    xmlACDSee.remove("<Categories>");
    xmlACDSee.replace("/", "|");

    QStringList tagsXml = xmlACDSee.split("<Category Assigned");
    int category        = 0;
    int length;
    int count;

    foreach(const QString& tags, tagsXml)
    {
        if (!tags.isEmpty())
        {
            count  = tags.count("<|Category>");
            length = tags.length() - (11 * count) - 5;

            if (category == 0)
            {
                tagsPath << tags.mid(5, length);
            }
            else
            {
                tagsPath.last().append(QString("/") + tags.mid(5, length));
            }

            category = category - count + 1;

            if (tags.left(5) == QString("=\"1\">") && category > 0)
            {
                tagsPath << tagsPath.value(tagsPath.size() - count - 1);
            }
        }
    }

    if (!tagsPath.isEmpty())
    {
        return true;
    }
}