Wednesday, February 27, 2013

Formatting API Anti-Pattern

I was meditating over subset of core Python API that's dedicated to self-inspection in a running script. This API consists of inspecttraceback + sys.exc_*  + magical locals() and ideally should give you a total understanding of Python state at a given point in your program. But that doesn't happen, because things are complicated, and here is why..

I wanted to get a chain of callers for the function I was debugging. I don't like pdb at all - it is so outdated for 2013 that deserves a separate GSoC project.  I prefer to just insert debug statements into the code and after several weeks and attempts I came up with a helper to get the name of the caller that I used to log and debug such complex projects as SCons and Spyder. I am not as smart as everybody else, but even I didn't expect to spend weeks researching this basic stuff.

I think the reason historically is that people tried to produce the meaningful data using functional approach without taking due attention on the data quality itself. Is the data completesufficient and accessible? You can not say that from the manual. The manual (see inspect) encourages you to hack your way in the debris of various bits of data that elusively fail to create a complete picture - the puzzle just doesn't solve, but you can't see this, because you don't know what is the structure of the Python state at a given point in your program and what to look for. The API encourages you to find functions that join and analyse the bits, you fail and try again and in the end you come up with a function filled with uncertainties, that just does some formatting magic.

"Formatting API Anti-Pattern" may not be the best term for that, but it is the API (data structures, functions and other stuff) so complex and incomplete that you can not see this, and instead of that forced to write formatting functions just to make sense of this incomplete data. We are so used to the fact that "API provides abilities" that completely miss the point that it also "provides limitation", and the data in API can be incomplete, insufficient (incomplete for certain tasks) and not accessible (code becomes very complicated to handle).

How to fight this? Think about data first. What data should be available for inspection at a given point in your program when this program is paused? Can you see this data in the manual? If not, then you're dealing with "Formatting API".

2 comments:

  1. I don't understand what's the rant is about, those API are quite useful, and help you understand.

    And yes the information behind the cover of an interpreter could be gory (any interpreter).

    Take about the grand picture / and understanding the state. but you fail to give an example of what was missing or misleading.

    As for saying pdb is old fashioned, can you point some of the features it lack ? (something you can point out that gdb or other commnad line debbugger has)
    if you were refering to UI, maybe you should try stuff like
    https://pypi.python.org/pypi/pudb
    or PyDev, PyCharm, and even IPython can be useful.

    Fruch

    ReplyDelete
    Replies
    1. One evident example is provided in the gist I've linked. When I analyze the caller of your function, I can't say if it is static method or it is from object method. I spent more than a week only to discover that it is impossible. The proper data model would let me see it within 5 minutes. I could be able to see there is no such information just by looking at the decomposition of this data and its aspects.

      Can pdb provide that?

      Delete