I've written a little benchmark where i compare different string concatenating methods for ZOCache.
So it looks here like tempfile.TemporaryFile is faster than anything else:
$ python src/ZOCache/tmp_benchmark.py
3.00407409668e-05 TemporaryFile
0.385630846024 SpooledTemporaryFile
0.299962997437 BufferedRandom
0.0849719047546 io.StringIO
0.113346099854 concat
The benchmark code i've been using:
#!/usr/bin/python
from __future__ import print_function
import io
import timeit
import tempfile
class Error(Exception):
pass
def bench_temporaryfile():
with tempfile.TemporaryFile(bufsize=10*1024*1024) as out:
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(i))
out.write(b" ")
# Get string.
out.seek(0)
contents = out.read()
out.close()
# Test first letter.
if contents[0:5] != b"Value":
raise Error
def bench_spooledtemporaryfile():
with tempfile.SpooledTemporaryFile(max_size=10*1024*1024) as out:
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(i))
out.write(b" ")
# Get string.
out.seek(0)
contents = out.read()
out.close()
# Test first letter.
if contents[0:5] != b"Value":
raise Error
def bench_BufferedRandom():
# 1. BufferedRandom
with io.open('out.bin', mode='w+b') as fp:
with io.BufferedRandom(fp, buffer_size=10*1024*1024) as out:
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(i))
out.write(b" ")
# Get string.
out.seek(0)
contents = out.read()
# Test first letter.
if contents[0:5] != b'Value':
raise Error
def bench_stringIO():
# 1. Use StringIO.
out = io.StringIO()
for i in range(0, 100):
out.write(u"Value = ")
out.write(unicode(i))
out.write(u" ")
# Get string.
contents = out.getvalue()
out.close()
# Test first letter.
if contents[0] != 'V':
raise Error
def bench_concat():
# 2. Use string appends.
data = ""
for i in range(0, 100):
data += u"Value = "
data += unicode(i)
data += u" "
# Test first letter.
if data[0] != u'V':
raise Error
if __name__ == '__main__':
print(str(timeit.timeit('bench_temporaryfile()', setup="from __main__ import bench_temporaryfile", number=1000)) + " TemporaryFile")
print(str(timeit.timeit('bench_spooledtemporaryfile()', setup="from __main__ import bench_spooledtemporaryfile", number=1000)) + " SpooledTemporaryFile")
print(str(timeit.timeit('bench_BufferedRandom()', setup="from __main__ import bench_BufferedRandom", number=1000)) + " BufferedRandom")
print(str(timeit.timeit("bench_stringIO()", setup="from __main__ import bench_stringIO", number=1000)) + " io.StringIO")
print(str(timeit.timeit("bench_concat()", setup="from __main__ import bench_concat", number=1000)) + " concat")
EDIT Python3.4.3 + io.BytesIO
python3 ./src/ZOCache/tmp_benchmark.py
2.689500024644076e-05 TemporaryFile
0.30429405899985795 SpooledTemporaryFile
0.348170792000019 BufferedRandom
0.0764778530001422 io.BytesIO
0.05162201000030109 concat
New source with io.BytesIO:
#!/usr/bin/python3
from __future__ import print_function
import io
import timeit
import tempfile
class Error(Exception):
pass
def bench_temporaryfile():
with tempfile.TemporaryFile() as out:
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(str(i), 'utf-8'))
out.write(b" ")
# Get string.
out.seek(0)
contents = out.read()
out.close()
# Test first letter.
if contents[0:5] != b"Value":
raise Error
def bench_spooledtemporaryfile():
with tempfile.SpooledTemporaryFile(max_size=10*1024*1024) as out:
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(str(i), 'utf-8'))
out.write(b" ")
# Get string.
out.seek(0)
contents = out.read()
out.close()
# Test first letter.
if contents[0:5] != b"Value":
raise Error
def bench_BufferedRandom():
# 1. BufferedRandom
with io.open('out.bin', mode='w+b') as fp:
with io.BufferedRandom(fp, buffer_size=10*1024*1024) as out:
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(i))
out.write(b" ")
# Get string.
out.seek(0)
contents = out.read()
# Test first letter.
if contents[0:5] != b'Value':
raise Error
def bench_BytesIO():
# 1. Use StringIO.
out = io.BytesIO()
for i in range(0, 100):
out.write(b"Value = ")
out.write(bytes(str(i), 'utf-8'))
out.write(b" ")
# Get string.
contents = out.getvalue()
out.close()
# Test first letter.
if contents[0:5] != b'Value':
raise Error
def bench_concat():
# 2. Use string appends.
data = ""
for i in range(0, 100):
data += "Value = "
data += str(i)
data += " "
# Test first letter.
if data[0] != 'V':
raise Error
if __name__ == '__main__':
print(str(timeit.timeit('bench_temporaryfile()', setup="from __main__ import bench_temporaryfile", number=1000)) + " TemporaryFile")
print(str(timeit.timeit('bench_spooledtemporaryfile()', setup="from __main__ import bench_spooledtemporaryfile", number=1000)) + " SpooledTemporaryFile")
print(str(timeit.timeit('bench_BufferedRandom()', setup="from __main__ import bench_BufferedRandom", number=1000)) + " BufferedRandom")
print(str(timeit.timeit("bench_BytesIO()", setup="from __main__ import bench_BytesIO", number=1000)) + " io.BytesIO")
print(str(timeit.timeit("bench_concat()", setup="from __main__ import bench_concat", number=1000)) + " concat")
Is that true for every platform? And if so why?
EDIT: Results with fixed benchmark (and fixed code):
0.2675984420002351 TemporaryFile
0.28104681999866443 SpooledTemporaryFile
0.3555715570000757 BufferedRandom
0.10379689100045653 io.BytesIO
0.05650951399911719 concat
Your biggest problem: Per tdelaney, you never actually ran the
TemporaryFiletest; you omitted the parens in thetimeitsnippet (and only for that test, the others actually ran). So you were timing the time taken to lookup the namebench_temporaryfile, but not to actually call it. Change:to:
(adding parens to make it a call) to fix.
Some other issues:
io.StringIOis fundamentally different from your other test cases. Specifically, all the other types you're testing with operate in binary mode, reading and writingstr, and avoiding line ending conversions.io.StringIOuses Python 3 style strings (unicodein Python 2), which your tests acknowledge by using different literals and converting tounicodeinstead ofbytes. This adds a lot of encoding and decoding overhead, as well as using a lot more memory (unicodeuses 2-4x the memory ofstrfor the same data, which means more allocator overhead, more copy overhead, etc.).The other major difference is that you're setting a truly huge
bufsizeforTemporaryFile; few system calls would need to occur, and most writes are just appending to contiguous memory in the buffer. By contrast,io.StringIOis storing the individual values written, and only joining them together when you ask for them withgetvalue().Also, lastly, you think you're being forward compatible by using the
bytesconstructor, but you're not; in Python 2bytesis an alias forstr, sobytes(10)returns'10', but in Python 3,bytesis a totally different thing, and passing an integer to it returns a zero initializedbytesobject of that size,bytes(10)returnsb'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'.If you want a fair test case, at the very least switch to
cStringIO.StringIOorio.BytesIOinstead ofio.StringIOand writebytesuniformly. Typically, you wouldn't explicitly set the buffer size forTemporaryFileand the like yourself, so you might consider dropping that.In my own tests on Linux x64 with Python 2.7.10, using ipython's
%timeitmagic, the ranking is:io.BytesIO~48 μs per loopio.StringIO~54 μs per loop (sounicodeoverhead didn't add much)cStringIO.StringIO~83 μs per loopTemporaryFile~2.8 ms per loop (note units; ms is 1000x longer than μs)And that's without going back to default buffer sizes (I kept the explicit
bufsizefrom your tests). I suspect the behavior ofTemporaryFilewill vary a lot more (depending on the OS and how temporary files are handled; some systems might just store in memory, others might store in/tmp, but of course,/tmpmight just be a RAMdisk anyway).Something tells me you may have a setup where the
TemporaryFileis basically a plain memory buffer that never goes to the file system, where mine may be ultimately ending up on persistent storage (if only for short periods); stuff happening in memory is predictable, but when you involve the file system (whichTemporaryFilecan, depending on OS, kernel settings, etc.), the behavior will differ a great deal between systems.