Thursday, January 08, 2015

Shipping Python tools in executable .zip packages

UP201501: Found out how to force setup.py create .zip instead of .tar.gz

This is about how to make Python source packages executable. As a side effect, this also explains, how to run invoke's tasks.py with a copy of invoke .zip shipped with your source code, but without installing it.

You know, Python can execute .zip files. As simply as:
$ wget https://pypi.python.org/packages/source/h/hexdump/hexdump-3.1.zip
$ python hexdump-3.1.zip
/usr/bin/python: can't find '__main__.py' in 'hexdump-3.1.zip'
Well, you need to place '__main__.py' into the root. It will then import what it needs and do something. The only problem with .zip source packages is that their structure (if created with standard Python tools) includes one additional directory at the top:
`-- hexdump-3.1
    |-- PKG-INFO
    |-- README.txt
    |-- hexdump.py
    |-- hexfile.bin
    `-- setup.py
So, even if you place `__main__.py` into the root, it won't be able to import file from `hexdump-3.1` subdirectory.. unless you prepare for it. Here is the solution:
import os
import sys

# add package .zip to python lookup path
__dir__ = os.path.dirname(__file__)
path = os.path.join(__dir__, 'hexdump-3.1')
sys.path.insert(0, path)

import hexdump
msg = hexdump.dehex("48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21")
print(msg)
Now if you run it, it will print the obvious.
$ python hexdump-3.1.zip
Hello, World!
In Python 3, the result will be wrapped into b'' string. But that's it!

The next hexdump version will likely to ship as executable .zip package

Where you may need it?


It may be useful for scripts automation if you're "vendorizing" dependencies (ship them with your code). It started with need to fix a broken link on Gratipay site. The Gratipay codebase is in public domain clear of any NDA, CLA and all other BS, and that makes it great for people to learn, reuse and enhance. I wish Python internal projects possessed these properties, but PSF politics is another topic.

Gratipay inside site uses make. It doesn't need powerful build tool, such as scons, so I tried to replace it with some simple task automation utility to be more cross-platform. I chose invoke, because of my previous experience with fabric for remote control, and because I saw previous attempts in another Gratipay repository. The necessary condition for new tool is that user experience should not degrade for make users, so main invoke's tasks.py script needed to be self-executable.

I completed the proof of concept by making invoke .zip package executable with the method above, placed it into vendor/ directory, and tuned tasks.py to reexecute itself with invoke .zip This also solved a potential problem with currently unstable invoke API.

tasks.py was modified as following:
import sys
sys.path.insert(0, 'vendor/invoke-0.9.0.zip')

from invoke import task, run

# ... tasks go here

if __name__ == '__main__':
  import sys
  from invoke import cli
  cli.main()

Bonus points


There is one more thing, which is better explained with hexdump example. hexdump package ships `hexfile.bin` data file used for testing. hexdump.py looks for it in its directory (__file__ dirname), and this lookup fails when hexdump.py is in archive. I'd say Python could provide some kind of an API to deal with "virtual import filesystem" (with watchers to track who and when modifies sys.path), but I digress. If you modify the `__main__.py` from the first chapter above to run tests - just call hexdump.runtest() - the execution will finally fail with the following error:
Traceback (most recent call last):
  File "/usr/lib/python2.6/runpy.py", line 122, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.6/runpy.py", line 34, in _run_code
    exec code in run_globals
  File "hexdump-3.1.zip/__main__.py", line 13, in
  File "hexdump-3.1.zip/hexdump-3.1/hexdump.py", line 311, in runtest
IOError: [Errno 20] Not a directory: '/root/hexdump-3.1.zip/hexdump-3.1/hexfile.bin'
(yes, I am hexdumping under root). The problem above was explained by Radomir, and I feel like the post is already tool long to write about it myself.

More bonus points


In theory it is possible to hack setup.py to produce executable source .zip package. By default, setup.py sdist command on Linux creates tar.gz files. So, first it needs to be forced to create .zip file. This can be done by overriding default --formats option:
# Override sdist to always produce .zip archive
from distutils.command.sdist import sdist as _sdist
class sdistzip(_sdist):
    def initialize_options(self):
        _sdist.initialize_options(self)
        self.formats = 'zip'

setup(
    ...
    cmdclass={'sdist': sdistzip},
)
I posted relevant documentation links to this StackOverflow question. The rest - how to inject __main__.py into the root of packed .zip - is yet to be covered, and I am not ready to research it just yet. Some hints may be provided by these answers.

Usability conclusion


I am actually quite happy with the ability to execute python .zip packages (which gave a lot of motivation to write this post), because previously I had to care about how to tell people that a tool is a single script, which can be downloaded from repository or from some dedicated download site (I mean Google Code, which does not support this anymore). I also don't trust my own server for downloads, because one day I found some "benzonasos" running there and I don't even have a slightest idea how it got there.

But now it becomes possible to just download and run the stuff from PyPI to test how it works without messing with creating and activating virtualenvs, and installation of package inside. Of course, if your .zip package has dependencies, they still need to be present on your system, but that's still a time saver. Oh, and that means that my tools can now consist of several modules inside.