Why can't we combine arguments in subprocess.Popen?

428 Views Asked by At

When using subprocess.Popen, we have to write

with subprocess.Popen(['ls', '-l', '-a'], stdout=subprocess.PIPE) as proc:
    print(proc.stdout.read())

instead of

with subprocess.Popen(['ls', '-l -a'], stdout=subprocess.PIPE) as proc:
    print(proc.stdout.read())

Why? What ls will get in the second case? Thank you.

3

There are 3 best solutions below

0
On

If you want to use string representation of command to execute, shlex module may be useful.

shlex.split(s[, comments[, posix]])

Split the string s using shell-like syntax. If comments is False (the default), the parsing of comments in the given string will be disabled (setting the commenters attribute of the shlex instance to the empty string). This function operates in POSIX mode by default, but uses non-POSIX mode if the posix argument is false.

assert shlex.split("ls -a -l") == ['ls', '-a', '-l']
subprocess.Popen(shlex.split("ls -a -l"))

It also covers more complex cases like escaping chars or quotes usage:

assert shlex.split("cat 'file with space.txt'") == ['cat', 'file with space.txt']
assert shlex.split(r"cat file\ with\ space.txt") == ['cat', 'file with space.txt']
1
On

In the second case -l -a as a single string will be the first argument to ls, which it won't know what to do with, or at least won't do what you want. In the first case -l is the first argument and -a is the second argument.

If you want to build a string that has the complete command you can use the shell=True flag to Popen, but then your command would be "ls -l -a" not ['ls', '-l -a']

With Popen each argument in the list is an argument passed to the command being executed, it's not a string passed to the shell to be interpreted, unless you ask for it to be passed to the shell to be interpreted.

2
On

When your operating system starts an executable, it does this via a call something very much like this:

execv('/usr/bin/ls', 'ls', '-l', '-a', NULL)

Note that the arguments are already split out into individual words before ls is started; if you're running your program with a shell, then the shell is responsible for doing that splitting; if you're running it via a programming language that lets you control the execv call's arguments directly, then you're deciding how to split the array up yourself.

When ls runs, it's passed those arguments in an array, argv. Witness the usual way a main function is declared in C:

int main(int argc, char *argv[]) {
  ...
}

It's getting an array of arguments, in a variable conventionally named argv, already broken up into individual words.

The parser for ls, then, can expect that when it's run it will be handed an array that looks like this:

argc = 3                   # three arguments, including our own name
argv = ['ls', '-l', '-a']  # first argument is our name, others follow

...so the command-line parser built into ls doesn't need to break up spaces inside of its arguments -- spaces have already been removed, and syntactic quotes honored and stripped, before the ls command is ever started.

Now, when you run ['ls', '-l -a'], you're explicitly specifying an argc of 2, not 3, and a single argument that includes a single string -l -a. To get that behavior from a shell, you'd need to use quoting or escaping:

ls "-l -a"
ls '-l -a'
ls -l\ -a

...and you'll find that ls fails the exact same way as what you get here when invoked from a shell with any of those usages.