The code below is a toy example of the actual situation I am dealing with1. (Warning: this code will loop forever.)
import subprocess
import uuid
class CountingWriter:
def __init__(self, filepath):
self.file = open(filepath, mode='wb')
self.counter = 0
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
self.file.close()
def __getattr__(self, attr):
return getattr(self.file, attr)
def write(self, data):
written = self.file.write(data)
self.counter += written
return written
with CountingWriter('myoutput') as writer:
with subprocess.Popen(['/bin/gzip', '--stdout'],
stdin=subprocess.PIPE,
stdout=writer) as gzipper:
while writer.counter < 10000:
gzipper.stdin.write(str(uuid.uuid4()).encode())
gzipper.stdin.flush()
writer.flush()
# writer.counter remains unchanged
gzipper.stdin.close()
In English, I start a subprocess, called gzipper, which receives input through its stdin, and writes compressed output to a CountingWriter object. The code features a while-loop, depending on the value of writer.counter, that at each iteration, feeds some random content to gzipper.
This code does not work!
More specifically, writer.counter never gets updated, so execution never leaves the while-loop.
This example is certainly artificial, but it captures the problem I would like to solve: how to terminate the feeding of data into gzipper once it has written a certain number of bytes.
Q: How must I change the code above to get this to work?
FWIW, I thought that the problem had to do with buffering, hence all the calls to *.flush() in the code. They have no noticeable effect, though. Incidentally, I cannot call gzipper.stdout.flush() because gzipper.stdout is not a CountingWriter object (as I had expected), but rather it is None, surprisingly enough.
1 In particular, I am using a /bin/gzip --stdout subprocess only for the sake of this example, because it is a more readily available alternative to the compression program that I am actually working with. If I really wanted to gzip-compress my output, I would use Python's standard gzip module.
Your "writer" is an arbitrary Python object - subprocess piping needs real files - as those will be used by their O.S. handlers in the subprocess. The only reason you get any data written to the output file at all is because you proxied
getattr- so the code in subprocess have retrieved thefileno()for your proxied file - the real, operating system level, file is the only thing seen in the actual subprocess (gzip) - not yourwriterobject.What can be done, instead, is promote
counterto a property which will callstaton your output file: