Parse Wiktionary

1.8k Views Asked by At

Is there any .Net library to parse pages I've retrieved through the mediawiki api? A standard mediawiki parser that could just give titles and the data in pure data would be fine, but I would rather have one that is specifically suited to wiktionary, one that could give me what type of word it is and all of the definitions.

I would prefer not to write my own parser for this. Any suggestions?

2

There are 2 best solutions below

1
On

If you get the output in JSON, there are many options you could use, both built in to .NET and external to the framework itself.

If you get the output in XML, again, there are powerful XML manipulation classes within the .NET framework itself and outside of the framework.

You're going to have to be more specific -- provide the format and some example output.

0
On

The dbnary project provides parsed information from Wiktionary in RDF form.

If you want something processed even further, I provide SQLite and TEI files generated from the dbnary data as part of my WikDict project at download.wikdict.com.

This does not really answer the question for .net libraries, but I'm sure you'll easily find libraries to read XML (TEI), SQLite or RDF.