Wednesday, July 02, 2008

Applying Unified Diffs with Python

Windows has a lot of annoyances for developers. One of these is that it lacks some precious tools - namely "diff" and "patch". They can be downloaded from the Internet, but when the latest patch binary provided by Win32 ports of version 2.5.9 refused to apply a patch built with "svn diff" and closed with an error, I decided to write my own version in python. If it will be included in standard python distributive as a logical complement to Scripts\diff.py utility then at least for people with python there will be no problem with applying patches in windows. One limitation though - the script parses only the most popular format of patches - unified diff.



To start out I've outlined a structure of unified diff using information from Guido van Rossum blog and wikipedia.











Parsing logic is implemented using brute-force regex parsing approach to avoid dependencies on parsing libraries (like pyparsing etc.). I took this approach to compare the code with the different techniques of Text Processing in Python by David Merz and learn how can I improve it.



Linefeeds are handled in automagic mode. Proper line ending is detected during scanning of source file. If source file has mixed line endings - lines from patch file are not transformed and written "as is". If lines in source files end with the same sequence - lines from patch file are stripped of their own line ends and applied.



The project doesn't have all UNIX patch options, but should be useful even without them. You may find it with sources (MIT license) at http://code.google.com/p/python-patch/