How to get line number from error iterator from XMLschema?

281 Views Asked by At

I'm fairly new to Python and coding in general so sorry if this is a very simple question. I'm working with the python packages XMLschema to validate some very large xml files. When I use the following code to get the error messages i only get the paths for the errors. This is okey with there are only 5-6 different "knude" but i have files which have 200+ of "knude" which makes this knowlegde very unusefull. I would there for like to the line number so I can go to the xml file and correct it.

Code:

    def get_validation_errors(xml_file, xsd_file):
        schema = xmlschema.XMLSchema(xsd_file)
        validation_error_iterator = schema.iter_errors(xml_file)
        errors = list()
        for idx, validation_error in enumerate(validation_error_iterator, start=1):
            err = validation_error.__str__()
            errors.append(err)
            print(f'[{idx}] path: {validation_error.path} | reason: {validation_error.reason} | message: {validation_error.message}')
        return errors

Results:

[1] path: /KnudeGroup/Knude[5]/StatusKode | reason: value must be one of [1, 2, 3, 4, 8] | message: failed validating 0 with XsdEnumerationFacets([1, 2, 3, 4, 8])

I have already tried reading the documentation and searched google and stackoverflow for an answer, but could not find any.

1

There are 1 best solutions below

0
On BEST ANSWER

Load the XML instance document with lxml, that way you have sourceline property on a validation error (https://github.com/sissaschool/xmlschema/blob/v2.2.3/xmlschema/validators/exceptions.py#L90) e.g. a minimal example would be

import lxml.etree as ET

from xmlschema import XMLSchema

xml_doc = ET.parse("sample1.xml")

schema = XMLSchema("sample1.xsd")

for error in schema.iter_errors(xml_doc):
    print(f'sourceline: {error.sourceline}; path: {error.path} | reason: {error.reason} | message: {error.message}')

and that way then outputs a line number as sourceline e.g. sourceline: 2; path: /root/item[1] | reason: invalid value 'a' for xs:decimal | message: failed validating 'a' with XsdAtomicBuiltin(name='xs:decimal').