Saturday, September 17, 2016

Python Usability Bugs: subprocess.Popen executable

subprocess.Popen seems to be designed as a "swiss army knife" of managing external processes, and while the task is pretty hard to solve in cross-platform way, it seems the people who have contributed to it did manage to achieve that. But it still came with some drawbacks and complications. Let's study one of these that I think is a top one from usability point of view, because it confuses people a lot.

I've got a simple program that prints its name and own arguments (forgive me for Windows code, as I was debugging the issue on Windows, but this works the same on Linux too). The program is written in Go to get single executable, because subprocess has special handling for child Python processes (another usability bug for another time).
>argi.exe 1 2 3 4
prog: E:\argi.exe
args: [1 2 3 4]
Let's execute it with subprocess.Popen, and for that I almost always look up the official documentation for Popen prototype:
subprocess.Popen(argsbufsize=0executable=Nonestdin=None, stdout=Nonestderr=Nonepreexec_fn=Noneclose_fds=False, shell=Falsecwd=Noneenv=Noneuniversal_newlines=False, startupinfo=Nonecreationflags=0)
Quite scary, right? But let's skip confusing part and quickly figure out something out of it (because time is scarce). Looks like this should do the trick:
import subprocess

args = "1 2 3 4".split()
p = subprocess.Popen(args, executable="argi.exe")
p.communicate()
After saving this code to "subs.py" and running it, you'd probably expect something like this :
> python subs.py
prog: E:\argi.exe
args: [1 2 3 4]
And... you won't get this. What you get is this:
> python subs.py
prog: 1
args: [2 3 4]
And that's kind of crazy - not only the executable was renamed, but the first argument was lost, and it appears that this is actually a documented behavior. So let's define Python Usability Bug as something that is documented but not expected (by most folks who is going to read the code). The trick to get code do what is expected is never use executable argument to subprocess.Popen:
import subprocess

args = "1 2 3 4".split()
args.insert(0, "argi.exe")
p = subprocess.Popen(args)
p.communicate()
>python suby.py
prog: argi.exe
args: [1 2 3 4]
The explanation for former "misbehavior" is that executable is a hack that allows to rename program when running subprocess. It should be named substitute, or - even better - altname to work as an alternative name to pass to child process (instead of providing alternative executable for the former name). To make subprocess.Popen even more intuitive, the args argument should have been named command.

From the high level design point of view, the drawbacks of this function is that it *does way too much*, its arguments are not always intuitive - it takes *a lot of time to grok official docs*, and I need to read it *every time*, because there are too many little important details of Popen behavior (have anybody tried to create its state machine?), so over the last 5 years I still discover various problems with it. Today I just wanted to save you some hours that I've wasted myself while debugging pymake on Windows.

That's it for now. Bonus points to update this post with link when I get more time / mana for it:

  • [ ] people who have contributed to it
  • [ ] it came with drawbacks
  • [ ] have anybody tried to create its state machine?
  • [ ] subprocess has special handling for child Python processes