python lxml in docker: "Document is empty" while parsing

298 Views Asked by At

Why this code is working without issues on my mac with any version of python, requests and lxml, but doesn't work in any docker container? i tried everything(

it just fails on 34533 line (discovered by printing el.sourceline)

from requests import get
from lxml import etree

r = get('https://printbar.ru/synsfiles/yandex/market/idrr_full.xml')
with open('test.xml', 'wb') as f:
    f.write(r.content)

tree = etree.iterparse(source='test.xml', events=('end',))
for (ev, el) in tree:
    continue

print('ok')

https://printbar.ru/synsfiles/yandex/market/idrr_full.xml seems completely valid and works locally on any of my macs...

i tried ubuntu, alpine, several python containers even with prebuilt lxml, nothing helped. I expected that parsing this file won't throw this error in the middle of parsing:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "src/lxml/iterparse.pxi", line 210, in lxml.etree.iterparse.__next__
  File "src/lxml/iterparse.pxi", line 195, in lxml.etree.iterparse.__next__
  File "src/lxml/iterparse.pxi", line 230, in lxml.etree.iterparse._read_more_events
  File "src/lxml/parser.pxi", line 1376, in lxml.etree._FeedParser.feed
  File "src/lxml/parser.pxi", line 606, in lxml.etree._ParserContext._handleParseResult
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "test.xml", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1

xmllint says that there is encoding error, but it works locally on mac...) HOW?) i want it dockerized!)

0

There are 0 best solutions below