How pexpect analyzes stdout of the child?

839 Views Asked by At

There is the following code:

child = pexpect.spawn("prog")
#some delay...
child.expect(Name .*: )
child.sendline('anonymous')

When the child process has been launched it can start to send to its stdout a lot of data e.g. log info. Does it mean pexpect start to look up through all child's stdout (from the start of process to the current moment)? Or pexpect starts to do it after expect calling only?

My child process makes a lot of log info. And CPU's going very slow. I suppose such pexpect's implementation can be cause

1

There are 1 best solutions below

0
On BEST ANSWER

After a child process is spawned, the child will write() its data to the pty (slave side) and waiting for the parent to read() the data from the pty (master side). If there's no child.expect() the child's write() may be blocked when it's outputing too much data because the write buffer is full.

When child.expect() matches a pattern it'll return and then you have to call child.expect() again otherwise the child may still be blocked after it outputs too much data.

See following example:

# python
>>> import pexpect
>>> ch = pexpect.spawn('find /')
>>> ch
<pexpect.pty_spawn.spawn object at 0x7f47390bae90>
>>>

At this time the find is spawned and it's already outputted some data. But I did not call ch.expect() so the find is now being blocked (sleeping) and it does not consume CPU.

# ps -C find u
USER     PID %CPU %MEM  VSZ   RSS TTY     STAT START   TIME COMMAND
root  100831  0.0  0.2 9188  2348 pts/12  Ss+  10:23   0:00 /usr/bin/find /
# strace -p 100831
Process 100831 attached
write(1, "\n", 1             <-- The write() is being blocked

Here the STAT S means sleeping (and s means session leader, + means foreground process).


According to pexpect's document, two options of spawn() may affect performance:

The maxread attribute sets the read buffer size. This is maximum number of bytes that Pexpect will try to read from a TTY at one time. Setting the maxread size to 1 will turn off buffering. Setting the maxread value higher may help performance in cases where large amounts of output are read back from the child. This feature is useful in conjunction with searchwindowsize.

When the keyword argument searchwindowsize is None (default), the full buffer is searched at each iteration of receiving incoming data. The default number of bytes scanned at each iteration is very large and may be reduced to collaterally reduce search cost. After expect() returns, the full buffer attribute remains up to size maxread irrespective of searchwindowsize value.