Python subprocess and KeyboardInterrupt - can it cause a subprocess without a variable?

288 Views Asked by At

If starting a subprocess from Python:

from subprocess import Popen

with Popen(['cat']) as p:
    pass

is it possible that the process starts, but because of a KeyboardInterrupt caused by CTRL+C from the user, the as p bit never runs, but the process has indeed started? So there will be no variable in Python to do anything with the process, which means terminating it with Python code is impossible, so it will run until the end of the program?

Looking around the Python source, I see what I think is where the process starts in the __init__ call at https://github.com/python/cpython/blob/3.11/Lib/subprocess.py#L807 then on POSIX systems ends up at https://github.com/python/cpython/blob/3.11/Lib/subprocess.py#L1782C1-L1783C39 calling os.posix_spawn. What happens if there is a KeyboardInterrupt just after os.posix_spawn has completed, but before its return value has even been assigned to a variable?

Simulating this:

class FakeProcess():
    def __init__(self):
        # We create the process here at the OS-level,
        # but just after, the user presses CTRL+C
        raise KeyboardInterrupt()

    def __enter__(self):
        return self

    def __exit__(self, _, __, ___):
        # Never gets here
        print("Got to exit")

p = None
try:
    with FakeProcess() as p:
        pass
finally:
    print('p:', p)

This prints p: None, and does not print Got to exit.

This does suggest to me that a KeyboardInterrupt can prevent the cleanup of a process?

2

There are 2 best solutions below

0
Michal Charemza On BEST ANSWER

From https://docs.python.org/3/library/signal.html#note-on-signal-handlers-and-exception a KeyboardInterrupt is caused by Python's default SIGINT handler. And specifically it suggests when this can be raised:

If a signal handler raises an exception, the exception will be propagated to the main thread and may be raised after any bytecode instruction. Most notably, a KeyboardInterrupt may appear at any point during execution.

So it can be raised between, but not during, any Python bytecode instruction. So it all boils down to is "calling a function" (os.posix_spawn in this case) and "assigning its result to a variable" one instruction, or multiple.

And it's multiple. From https://pl.python.org/docs/lib/bytecodes.html there are STORE_* instructions which are all separate from calling a function.

Which means that in this case, a process can be made at the OS level that Python doesn’t know about.

https://docs.python.org/3/library/signal.html#note-on-signal-handlers-and-exception also states

Most Python code, including the standard library, cannot be made robust against this, and so a KeyboardInterrupt (or any other exception resulting from a signal handler) may on rare occasions put the program in an unexpected state.

Which hints as to how, I think pervasive, albeit maybe rare in practice, such issues can be in Python.

But https://docs.python.org/3/library/signal.html#note-on-signal-handlers-and-exception also gives a way of avoiding this:

applications that are complex or require high reliability should avoid raising exceptions from signal handlers. They should also avoid catching KeyboardInterrupt as a means of gracefully shutting down. Instead, they should install their own SIGINT handler.

Which you can do if you really need/want to. Taking the answer from https://stackoverflow.com/a/76919499/1319998, which is itself based on https://stackoverflow.com/a/71330357/1319998 you can essentially defer SIGINT/the KeyboardInterrupt

import signal
from contextlib import contextmanager

@contextmanager
def defer_signal(signum):
    # Based on https://stackoverflow.com/a/71330357/1319998

    original_handler = None
    defer_handle_args = None

    def defer_handle(*args):
        nonlocal defer_handle_args
        defer_handle_args = args

    # Do nothing if
    # - we don't have a registered handler in Python to defer
    # - or the handler is not callable, so either SIG_DFL where the system
    #   takes some default action, or SIG_IGN to ignore the signal
    # - or we're not in the main thread that doesn't get signals anyway
    original_handler = signal.getsignal(signum)
    if (
            original_handler is None
            or not callable(original_handler)
            or threading.current_thread() is not threading.main_thread()
    ):
        yield
        return

    try:
        signal.signal(signum, defer_handle)
        yield
    finally:
        # Note: if you have installed a signal handler for another
        # signal that raises an exception, the original handler won't
        # ever get re-attached
        signal.signal(signum, original_handler)
        if defer_handle_args is not None:
            original_handler(*defer_handle_args)

to create a context manager with a stronger guarantee that you don't get a sort of zombie process due to a SIGINT during its creation:

@contextmanager
def PopenDeferringSIGINTDuringConstruction(*args, **kwargs):
    # Very much like Popen, but defers SIGINT during its __init__, which is when
    # the process starts at the OS level. This avoids what is essentially a
    # zombie process - the process running but Python having no knowledge of it
    #
    # It doesn't guarentee that p will make it to client code, but should
    # guarentee that the subprocesses __init__ method is not interrupted by a
    # KeyboardInterrupt. And if __init__ raises an exception, then its
    # __del__ method also shouldn't get interrupted, and so will clean
    # up as best as it can

    with defer_signal(signal.SIGINT):
        p = Popen(*args, **kwargs)

    with p:
        yield p

which can be used as:

with PopenDeferringSIGINTDuringConstruction(['cat']) as p:
    pass

Or... instead of dealing with signal handlers, as https://stackoverflow.com/a/842567/1319998 suggests, you can start the process in a thread, which doesn't get interrupted by signals. But then you are left with having to deal with both inter thread and iter process communication.

import threading
from contextlib import contextmanager
from subprocess import Popen

@contextmanager
def PopenInThread(*args, **kwargs):
    # Very much like Popen, but starts it in a thread so its __init__
    # (and maybe __del__ method if __init__ raises an exception) don't
    # get interrupted by SIGINT/KeyboardInterrupt

    p = None
    exception = None

    def target():
        nonlocal p, exception
        try:
            p = Popen(*args, **kwargs)
        except Exception as e:
            exception = e

    t = threading.Thread(target=target)
    t.start()
    t.join()

    if exception is not None:
        raise exception from None

    with p:
        yield p

used as

with PopenInThread(['cat']) as p:
    pass

Choose your poison...

9
SomeSimpleton On

I dont think you should have a problem with that. Plus your opening the Popen within a context manager so as soon as it exits that indent the context manager should automatically properly exit the Popen session for you.

I believe it is the same as a finally statement in a try block where it will still run the finally statement upon a keyboard interrupt. I will include the documentation link to the context manager below so you can read more about it.

Edit: The context manager does use a try block with a finally statement to release/exit.

https://docs.python.org/3/library/contextlib.html

Edit For Follow up:

So for if the __init__ isnt fully complete and a keyboard interrupt is called it should get handled by garbage collection and the __del__ function should automatically be called. I provided the link for that below. That should handle everything for you.

https://github.com/python/cpython/blob/e3a11e12ab912f0614a90ade7acd34dda7e7f15e/Lib/subprocess.py#L1120C7-L1120C7