Controlling number of bytes read() at a time with Expat

218 Views Asked by At

I'm parsing some XML using Python's Expat (by calling parser = xml.parsers.expat.ParserCreate() and then setting the relevant callbacks to my methods).

It seems that when Expat calls read(nbytes) to return new data, nbytes is always 2,048. I have quite a lot of XML to process, and suspect that these small read()s are making the overall process rather slow. As a point of reference, I'm seeing throughput around 9 MB/s on an Intel Xeon X5550, 2.67 GHz running Windows 7.

I've tried setting parser.buffer_text = True and parser.buffer_size = 65536, but Expat is still calling the read() method with an argument of just 2,048.

Is it possible to increase this?

1

There are 1 best solutions below

1
On BEST ANSWER

You're talking about the xmlparse.ParseFile method, right?

Unfortunately, no, that value is hardcoded as BUF_SIZE = 2048 in pyexpat.c.