Wednesday, February 27, 2013

Formatting API Anti-Pattern

I was meditating over subset of core Python API that's dedicated to self-inspection in a running script. This API consists of inspecttraceback + sys.exc_*  + magical locals() and ideally should give you a total understanding of Python state at a given point in your program. But that doesn't happen, because things are complicated, and here is why..

I wanted to get a chain of callers for the function I was debugging. I don't like pdb at all - it is so outdated for 2013 that deserves a separate GSoC project.  I prefer to just insert debug statements into the code and after several weeks and attempts I came up with a helper to get the name of the caller that I used to log and debug such complex projects as SCons and Spyder. I am not as smart as everybody else, but even I didn't expect to spend weeks researching this basic stuff.

I think the reason historically is that people tried to produce the meaningful data using functional approach without taking due attention on the data quality itself. Is the data completesufficient and accessible? You can not say that from the manual. The manual (see inspect) encourages you to hack your way in the debris of various bits of data that elusively fail to create a complete picture - the puzzle just doesn't solve, but you can't see this, because you don't know what is the structure of the Python state at a given point in your program and what to look for. The API encourages you to find functions that join and analyse the bits, you fail and try again and in the end you come up with a function filled with uncertainties, that just does some formatting magic.

"Formatting API Anti-Pattern" may not be the best term for that, but it is the API (data structures, functions and other stuff) so complex and incomplete that you can not see this, and instead of that forced to write formatting functions just to make sense of this incomplete data. We are so used to the fact that "API provides abilities" that completely miss the point that it also "provides limitation", and the data in API can be incomplete, insufficient (incomplete for certain tasks) and not accessible (code becomes very complicated to handle).

How to fight this? Think about data first. What data should be available for inspection at a given point in your program when this program is paused? Can you see this data in the manual? If not, then you're dealing with "Formatting API".