Process infinite input from external command line by line

757 Views Asked by At

I have a program that reads in output from an external command and process line by line. However, when the external command outputs infinitely (prints some string in an infinite loop for instance), the program seems to be blocked and delayed - it buffers output and prints everything altogether when terminated. What's the best way to process infinite data input from another process, so that the program can process data "same time" as the external one is writing? Is there any high level API that works with this case, where I don't need to use monitor or other concurrency stuff? Thanks! I code in Scala, so both Scala and Java library would help. Here's Scala code segment.

val pb = new ProcessBuilder("./streamTest.py")
val p = pb.start()
val reader = new BufferedReader(new InputStreamReader(p.getInputStream()))
var line = reader.readLine()
while(line != null) {
  println(line)
  line = reader.readLine()
}
reader.close()

Same code segment in Java:

ProcessBuilder pb = new ProcessBuilder("./streamTest.py");
Process p = pb.start();
BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = reader.readLine();
while(line != null) {
  System.out.println(line);
  line = reader.readLine();
}
reader.close();

Here's an example of external script streamTest.py that doesn't work. If I change the command to simpler ones that terminate such as "ls -l", that is, let pb be new ProcessBuilder("ls -l"), the program works fine.

#! /usr/bin/python
import time
while True: 
    time.sleep(1)
    print("Hello World!")
2

There are 2 best solutions below

2
On

You need to tell Python not to buffer the output.

Following solutions are possible.

manual flush

#! /usr/bin/python
import time
import sys
while True: 
    time.sleep(1)
    print("Hello World!")
    sys.stdout.flush()

change buffering mode of stdout

#! /usr/bin/python
import time
import sys
import os
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
while True: 
    time.sleep(1)
    print("Hello World!")

execute the script in unbuffered mode

#! /usr/bin/python -u
import time
while True: 
    time.sleep(1)
    print("Hello World!")

or

ProcessBuilder pb = new ProcessBuilder("python", "-u", "./streamTest.py");

or

# set an environment variable before calling the Java application
export PYTHONUNBUFFERED=x

Other solutions possible.

3
On
System.out.print() 

method does not flush.Therefore you cannot read it from the other java process.You may add this line after print method :

 System.out.flush();

Or you may just write

System.out.println("hello world");

since println() method automatically flushes as stated in Oracle documentation

a PrintStream can be created so as to flush automatically; this means that the flush method is automatically invoked after a byte array is written, one of the println methods is invoked, or a newline character or byte ('\n') is written.