My application consumes XML data from different vendors. All the vendors have different xml formats/schema and custom queries are required to retrieve various data from those XML.
I initially started out with a RDBMS approach, wherein after retrieving a specific XML from the vendor, i would parse/query the XML and write the data in some tables.(using Woodstock StAX parser). However due to the very nature of RDBMS (fixed schema), i am not able to support all the XML formats from different vendors and even if i do, i have to "normalize" the hierarchical xml into a RDBMS fixed schema relational-data.
The xml/data from the vendors are updated frequently everyday and sizes vary between a few kbs up-to 50 MB data files.
I am evaluating various NXD (Native Xml Databases), eXist-db, Sedna, BaseX and MonetDB as a next step to see if this would suite my purposes.
Can some one please provide some practical advice on how to work this out? or has built similar system, which handles a lot of XML data of different formats/schema.
Here are the core XML requirements i am trying to answer:
- Handles multiple xml data files, from multiple sources. XMLs are different from vendor to vendor.
- XML updates, of the whole document as well as some fields in the existing XML in the DB.
- Identify whether its from a particular vendors and fire the queries accordingly.
- Query these xml using XPath/XQuery to read the data to present it to the users in a common view.
Please advice.
Thanks, Subhro.
I think you're right to go for a native XML database rather than a hybrid database such as SQL server. The hybrid databases, because they do so much, often take longer to provide core XML functionality such as conformant implementations of XQuery and XQuery updates; and the fact that they offer so many different ways of storing data can make it difficult to keep your developers under control.
As to which product, there's a real problem here. Evaluating database products is an expensive process, and doing it thoroughly enough to get a meaningful answer will probably cost you more than making the wrong choice. When a database project fails, it's usually not because the database software was deficient, but because the project lacked the skills and experience to use the chosen database software to its best advantage. So I would say, choose a product based on the skills available to you.