Avoid buffering when parsing stdout with Perl

373 Views Asked by At

I want to parse the output of an external program (some shell command) line by line using Perl. The command runs continuously, so I put it into a thread and use shared variables to communicate with my main routine.

Up to now my code looks similar to

#!/usr/bin/perl

use warnings;
use strict;
use threads;
use threads::shared;

my $var :shared; $var="";

threads->create(
    sub {
        # command writes to stdout each ~100ms
        my $cmd = "<long running command> |";
        open(README, $cmd) or die "Can't run program: $!\n";
        while(<README>) {
            my $line = $_;
            # extract some information from line
            $var = <some value>;
            print "Debug\n";
        }
        close(README);
    }
);

while(1) {
    # evaluate variable each ~second
    print "$var\n";
    sleep 1;
}

For some commands this works perfectly fine and the lines are processed just as they come in. Output would be similar to:

...
Debug
Debug
...
<value 1>
...
Debug
Debug
...
<value 2>
...

However, for other commands, this behaves strange and the lines are being processed block wise. So $var doesn't get updated and Debug is not printed either for some time. Then, the suddenly the output is (similar to):

...
<value 1>
<value 1>
<value 1>
...
Debug
Debug
Debug
...
<value 20>

and $var is set to the last/current value. Then this repeats. The parsing is always delayed and done in blocks while $var is not updated in between.

First of all: Is there any better/propper way to parse the output of an external program (line by line!) besides using the pipe?

If not, how can I avoid this behaviour?

I've read, that using autoflush(1); or $|=1; might be a solution but only for the "currently selected output channel". How would I use that in my context?

Thank you in advance.

2

There are 2 best solutions below

2
On BEST ANSWER

Thanks to ikegami and Calle Dybedahl I found the following solution for my problem:

#!/usr/bin/perl

use warnings;
use strict;
use threads;
use threads::shared;
use sigtrap qw(handler exit_safely normal-signals stack-trace error-signals);
use IPC::Run qw(finish pump start);

# define shared variable
my $var :shared; $var="";

# define long running command
my @cmd = ('<long running command>','with','arguments');
my $in = '';
my $out = '';
# start harness
my $h = start \@cmd, '<pty<', \$in, '>pty>', \$out;

# create thread
my $thr = threads->create(
    sub {
        while (1) {
            # pump harness
            $h->pump;
            # extract some information from $out
            $var = <some value>;
            # empty output
            $out = '';
        }
    }
);

while(1) {
    # evaluate variable each ~second
    print "$var\n";
    sleep 1;
}

sub exit_safely {
    my ($sig) = @_;
    print "Caught SIG $sig\n";
    # harness has to be killed, otherwise
    # it will continue to run in background
    $h->kill_kill;
    $thr->join();
    exit(0);
}

exit(0);
2
On

In the general case, your script cannot change the buffering of the child process' output. In some specific cases you may be able to do so by starting it with appropriate switches, but that's about it.

I would recommend that instead of writing your own code to do the running and reading, you re-write your script to use the IPC::Run module. It exists to solve exactly this sort of problem. The documentation isn't the best ever, but the module itself is well-tested and solid.