another day another vice another roll of the dice: 2012

Friday, December 14, 2012

Using getopt with optparse (or how to move from getopt gradually)

TL;DR: https://bitbucket.org/techtonik/scons/commits/bcb60b

SCons has a very old and interesting codebase with a lots of outdated and unusual stuff that makes it more difficult to extend. One such thing is getopt library, which is a predessor for Optik library (written by Greg Ward) now better known as optparse.

So I wanted to replace getopt with optparse, but didn't want to change everything in one step, because I didn't have time to check every option. Instead I decided to parse options I needed with optparse and leave everything else to the old getopt engine.

getopt only needs a list of arguments to work. sys.argv[1:] to be exact. This is also the second half of result returned by OptionParser.parse_args() function. The only problem was to teach OptionParser to ignore unknown options and leave them in arguments. Strange thing, but Optik examples included this user story, completely ignored in optparse documentation. To make this long user story short, you need to subclass OptionParser to use getopt with optparse:

# "Pass-through" option parsing -- an OptionParser that ignores
# unknown options and lets them pile up in the leftover argument
# list.  Useful to gradually port getopt to optparse.

from optparse import OptionParser, BadOptionError

class PassThroughOptionParser(OptionParser):
    def _process_long_opt(self, rargs, values):
        try:
            OptionParser._process_long_opt(self, rargs, values)
        except BadOptionError, err:
            self.largs.append(err.opt_str)
    def _process_short_opts(self, rargs, values):
        try:
            OptionParser._process_short_opts(self, rargs, values)
        except BadOptionError, err:
            self.largs.append(err.opt_str)

parser = PassThroughOptionParser(add_help_option=False)
parser.add_option('-a', '--all', action='store_true',
                      help="Run all tests.")
(options, args) = parser.parse_args()

#print "options:", options
#print "args:", args

Now pass args down to the getopt call and you're all set.

P.S. In argparse you can use ArgumentParser.parse_known_args() function.

Update 2013-02: For humane option parsing you should definitely see docopt library.

Wednesday, December 05, 2012

Good reference on Python magic methods

I've just stumbled upon this manual about Python magic methods and it's really amazing. Definitely a good refresher and highly recommended.

http://www.rafekettler.com/magicmethods.html

/me wonders if the same engineering technique can be applied to official documentation corpus..

Tuesday, November 20, 2012

Cinematic journey approach for Python development

Quotes page (fixed in stone) is silent about the one who said that Python, compared to other languages, allows to directly put thoughts into the code. I couldn't disagree with this, but taking idealistic approach, this was more true with Python 2 when coding on a system level, but not so true with the great coming of the web and i18n. So, what's wrong now? I don't have a clean and up to the point answer, because many people still think that there is nothing wrong with the Python. Probably the right question is: why Python is not better than it is now?

This one of the complicated questions nobody is able to answer fully. 42 is the answer, but does the question clear enough? The question is probably too complex for a good technical answer and should undergo decomposition. The decomposition can be achieved by clarifying. What means "better"? More easy to code in. Why is it hard to code? Here goes a list of problems...

...

Well, there is no list. Therefore there is no visibility, and without visibility no answer is possible. Gain visibility into the list of problems that make Python not-as-good as we want it to be is the primary step to take to make all subsequent steps reasonably grounded for a good party quest (and sane development roadmap for community to focus on).

Historically there were several driving forces behind Python development - mailing lists, bug reports and PEPs. PEPs more than the bugs. Mailing lists somewhere in between (YMMV).

ML were good until people had a lot of time to follow up. Bugs are good at tracking status of things, but they are tuned for fixing things and scratching issues, so language research naturally falls out of context in bug tracker interface. PEPs.

PEP is a good thing that helped to free Python core from featurecreep damage, provided a basis for discussions over a long period of time and insight into decisions over the language development. But PEPs start to fail, and the reason why they do this is the lack of time and energy to iterate over them. Most people can't say if technology is good or bad before testing it (version control as an example), and PEPs with lengthy pieces of design detail assume prior experience with the problem, require thorough imagination to see if the solution will play well.

PEPs require a lot of concentration - the resource of a big shortage nowadays, especially of professional grade. Which is not a surprise if you look at how good HR and management technologies are developed in modern world to keep people busy and involved. We can only hope that collective minds of big corp.s are somehow bugged with the problem and look for solutions to divert their resource flow to improve the grounds they are standing on. Let's hope that community can back up their support, and also somehow bugged with the problem about how to lower barriers of requirements, responsibility, experience and technical expertise for occasional community member, a student or elderly accountant, to be useful in Python development process. Lets's hope that both parties are interested enough to constantly improve ways to use the resource flow to the fullest extent possible.

There are two things that can be help here (and make Python better that it is now) - first one is to improve visibility. It takes its roots in cinematic industry and it's called scenario. Second one is to improve the process and it is a best practice developed over the time by user experience professionals. This one named customer journey map.

What keeps me away from putting my thoughts into code when I write Python?

"""Python forces me to maintain a lowest level structure of my writing - the indented layout, a good thing. Although this also comes with a pain while debugging, because Gangam style multiline comments require me to remember to indent them as well.""" - this is a scenario. You can add various metrics to it, such as:

"""I have only 7 operational attention slots in my mind, and one constantly falls out, because I have to pay attention to complicated commenting requirement.""" - the metrics here directly influences how deep one person can operate at any given moment. It is basically that multiline comments with strings are stealing concentration.

"""Those indentation errors are driving me mad every time I forget to indent multiline comment for debugging.""" - this says that a person uses iterative approach to debug problems, often commenting a lot, and probably in production environment using non-tuned editor. That's another scenario where Python comment hack doesn't play well.

Scenarios have two good qualities - they are short and can be conflicting between each other. PEP is on the other side - it is self-sufficient. To notice that PEP is contradictory - you need to attentively and thoroughly read it or write it yourself. It takes a lot of time. Scenarios are somewhat emotional, they are easy to remember and refer to. This makes it possible to concentrate on conflicting scenarios, outline conflicting points and concentrate all work around them rather than around vague opinions, which makes the whole process of looking for compromises (or good solutions) more fun and involving.

To summarize, the scenario is a good title to remember and a short story to tell. What is the difference between scenario and a StackOverflow question? Question may not have a story, scenario may not contain questions. What's the difference between scenario, use case and user story? "Use case" is an enterprise slang, "user story" is an agile term. Both may have some definitions. Scenario is just scenario, like in movie. You should replay it to see how it works. Scenario is for humans, it is less formalized and comes with emotions included (YMMV).

Let's skip to another example of problem with Python usability on a higher level - packaging - and present another tool from usability domain that can help with analyzing processes in general.

What's wrong with Python packaging that everybody constantly rewrites it?

I didn't intend to include it here first, but a half an hour ago I spotted this article - http://lucumr.pocoo.org/2012/6/22/hate-hate-hate-everywhere/ If distutils/setuptools had a scenario database for packaging, it could be possible to analyze limitations of Python in regard to each scenario. This analysis is similar to PEP, but not necessary a proposal and not necessary so extensive. Scenario may contain a history of the problem, a short description, summary and link to other conflicting scenarios. The role of scenario database is to aid decision making process and an easy reference for new people facing the same problems.

Scenarios can be universal and it is a good analysis tool. You can substitute Ruby for Python and look how good this specific workflow looks for the different system.

"""I can't list installed Python packages, why?""" - does anybody have a link? """I can't find the answer""", and that's another scenario about usefulness of scenarios.

So, to fix packaging there should be a way to operate with scenarios. There should be at least a list of scenarios (or better indented tree), so that (y-)hackers of a new packaging tool could go over it, think about their approach, tick checkboxes and hopefully, spot and bring to the surface this "Essential Packaging Restraint" that eats a whole generations of people. The point is to spot the problem before starting to code.

The scenario DB will help, but there is another usability tool that can make packaging, bug tracking and other development processes more streamlined (less time consuming, more fun and engaging). This tool is called Customer Journey Map, and it shows to people, who are not experiencing any problems with the process, where those problems are for somebody else. This map is also a good starting point in web site redesigns, conference organizations, all kinds of activities that involve people, or more specific, a single person named "Customer", barriers this guy is facing and steps to remove these barriers.

I can't extend to a great detail about CJM in this post due to time constraints. I was impressed by a presentation of awesome UXpresso team, there might be a video available, but it is likely Russian only, and I've heard of at least one major Python company (wargaming.net) that uses it extensively, so I can only give you a pointer for now. It will be interesting to make presentation of this technology for Python contribution process and talk about CJM at PyCon, but I am unlikely to afford the participation costs, so somebody else should do this.

Monday, July 09, 2012

About Environment

In Python applications environment is often an ambiguous term that needs clarification. In general sense `the environment` is system environment with PATH and friends accessible from os.environ within Python. But in Python applications it can mean different things.

In Trac `the environment` is a directory with settings, database and other files related to one Trac site.

In SCons `the environment` is a structure in memory that holds dependency trees, helper functions, builders and other stuff. It is written to disk only for caching.

Quite many other applications have some kind of environment for their own purpose with meanings close to either Trac or SCons, which often confuses newbies or strangers who are not aware about the context. Software development clearly needs more specific terms in English to make people write and read in the same language without those excessive contexts.

Saturday, May 26, 2012

Spyder IDE Internals: Highlighting in 2.2.0dev

UPD: Highligher support in code editor have been improved, so its instance can now be traced easily.

The entrypoint to syntax highlighting in Spyder IDE is located in CodeEditor widget at spyderlib/widgets/sourcecode/codeeditor.py

CodeEditor widget is basically file content in one of main tabs. The whole stack of tabs is called EditorStack. So editors are grouped in editor stacks, each editor renders one file, and each editor has its own highligher created from assigned self.highlighter_class

Highlighers are implemented with Qt's QSyntaxHighlighter. No pygments, nothing like this, so can't say if Qt is faster, but it should be. Default self.highligher_class for every code editor is TextSH. The actual instance is created by set_language() method called from setup_editor(). If setup_editor() is not called, the highlighter can be unset, but frankly I don't know what's the purpose of using such editor.

if set, syntax highlighter (self.highlighter) is responsible for:

coloring raw text data inside editor on load
coloring text data when editor is cloned
updating document highlight on line edits
providing color palette (scheme) for the editor
providing data for Outliner

self.highlighter is not responsible for:

background highlight for current line
background highlight for search / current line occurrences

Enjoy hacking.

Saturday, April 28, 2012

Multidimensional programming

From time to time I reverse some piece of open source code just to understand how it works. The biggest problem with that is the amount of things I need to juggle in my head until they find a place on the canvas of reversed blueprint. If there are too many - I give up or put the project on the back burner. This was with Stackless and Twisted, but I really glad to finally get to them.

Quite often is starts with some bug that seems easy to fix and I go for it. In ideal world it shouldn't take more than 15 minutes for studying the code to gain understanding what should be changed, but it is also important to have a confidence that the change won't break anything.

I won't tell you how to deal with that complexity, but I'd like to share an idea that I've got from Large Hadron Collider. =) Let me tell you that story...

I always had troubles trying to squeeze more than 3 dimensions into my head. I could imagine a dot moving along X or Y axes easily, could imagine it moving along Z axis, but everything more than that caused a confusion.

I could have my brain exploded when some time ago a friend of mine tried to explain what's going on in LHC. He said that physics and mathematicians are trying to figure out how many dimensions our universe has. They assume that our universe has more than 4 dimensions (3 coordinates and 1 time value). In fact they argue that the truth is somewhere between 9 and 26!

I am not a fan of scientific theories - my neurons are pretty calm to that matter, but when you just need to understand, because the person in front of you is knowledgeable and tries all the best to explain - here is when you start to feel colliding brain cells under your skull. To the honor of my friend, he didn't try to build-up suspense (as I would probably do before explaining properly) and managed to draw a clear picture in my mind. A standard plot with two axes - X and Y, and a dot.

"Look at this dot on X/Y plane. This dot has coordinates in two dimensions - X and Y", - he explained. "These are only two you can see here. But this dot is from the real world, so it also has Z coordinate", - he added putting a label "Z=0.1" next to the dot. "This dot also have speed", - he drew "V=0.1 m/s". "Now we can see values for all 4 traditional dimensions of our dot, but we can add more. There are many things that we can classify our dot. It probably has color.", - he drew a color box and a hex value. "It has temperature", - a new label "t = 25С" appeared in the column. "There are 6 already, and you can add your own.". I immediately imagined a dot travelling through canvas of "Stars!" game with all those labels nearby that constantly change their values as the dot moved. "Wow! Now I see", - that was a nice feeling - I must admit I can be pretty dumb sometimes. =) It didn't make a science fan out of me, but did make an important short-circuit in the depths of my head, which popped up a few months later..

A few months later.

As it usually happens in software development you need at least some basic design before setting down to code, and with time design phase completely faded from our process into "who made it first - wins" motto. It is hard to argue with that, so I didn't. But as a result after some time, the stability of our releases dropped. People were not communicating, and started to forget about some aspects of our system that could fire at any time in completely unexpected places. Even though all our code undergoes reviews, the reviewers tend to forget about those aspect as well. The process went out of control - we couldn't keep all the aspects in our heads when planning for the next feature to be released, and I found myself in same state I was when trying to understand the complexity of multidimensional string theories. That lead me to the idea that we need to control the amount of aspects that we need to keep in mind when coding and restructure our architecture to keep those aspects at minimum. This will help us to regain sense of confidence into what we are doing, and save some money from the bills from the nearest bar on the name of our release manager.

So, the `multidimensional programming` concept means that at any point of your code there are multiple things that you should be aware of. These things are the 'dimensions', and the more you have - the more complex your application is. Basically, these are the things that can be broken by any change at this place. Good application architecture is orthogonal - you can work at only one dimension at a time without thinking too much about all others. But you need to know all of them anyway to gain a sense of confidence.

For example, a recent change in Spyder IDE requires me to rename some file. This breaks the code, which I grep and fix - that's one dimension. But it will also likely to break a translation for the strings in that file, because the string is now at a different place. I imagine that nobody will be interested to check and translate the same stuff over and over again, so that's one more thing I'd like to avoid, so I should keep that in mind.

Another example are web applications. You need to keep in mind 'user privileges', type of HTTP request ('ajax', ...) and response ('json', ...) required. You need to make sure `critical errors` are handled and reported, and `static files` are correctly served by web server. You need to save incomplete `data between requests`, and cleanup it where possible. Make sure there is sufficient `XSRF protection` and `browser compatibility`. There are a lot more to it, and so far these have nothing to do with the logic of your web application. Frameworks help to deal with that, but inside they are still multidimensional. If framework is not flexible for you - that probably means it tries to keep some dimensions orthogonal, and there could be a good reason for that.

Maybe that's not much, but at least now you can argument that Large Hadron Collider experiments have much in common with software engineering, and when somebody asks about your job - you can proudly state that you're on par with scientists with their string theory, but in your own big enterprise application universe. =)

Friday, February 17, 2012

Rietveld architecture: AppEngine/Django request processing

Just a quick note/reminder of the request handling flow in AppEngine environment for mixed AE/Django application such a Rietveld . Hopefully it provides a good entrypoint to understand how AppEngine works. I use Rietveld as an example, because this project is basically born to show how to run Django on AE.

Rietveld is a Django application that is run by AppEngine. Let's leave all Django stuff aside and learn how AppEngine loads and initializes Python applications first.

Import and execution in Python web apps

In PHP when your code is executed, it is read and interpreted (executed) from start to finish every time a new request arrives. In Python the code is read only once (imported), executed and on subsequent requests only the part that handles request is invoked over and over. The catch is that first request is always different in Python.

What happens when application is uploaded to AppEngine?

Step 1. Standard AppEngine application loading and initialization sequence

AppEngine reads app.yaml to understand how to load application (which version of Python it requires and which URLs are handled by which Python scripts)

AppEngine initializes application by creating an instance for it

Then it looks at URL and executes script that shoud process this URL according to app.yaml

This stuff is actual for every AppEngine application.

Example: --- app.yaml from Rietveld project --->

It is the entrypoint to understand any AppEngine app. If you want to know what is called when you request an URL - first thing to do is to look there.

Step 2. Python code to fine-tune AppEngine params and configure request handler

All requests in Rietveld (except static files) are handled by main.py. It does the following:

Imports appengine_config.py that in turn:

Initializes and tunes Appstats tool
Chooses version of Django to use (1.2 currently)
Configures Django to read settings.py with Rietveld specific parameters

Adds logger for all exceptions
Removes Django's DB rollback event handler (because Rietveld doesn't use DB layer of Django)
Creates request handler using Django
Passes handler to AE's run_wsgi_app() util to give Django control to process request

Django didn't fire at this point, and nothing magical happened.

Step 3. Request handling magic

Request handling starts with run_wsgi_app() - this magical function implicitly imports appengine_config.py to read its own settings behind the scenes and then gives control to Django handler created earlier.

Django reads its settings.py mentioned earlier and processes options before executing anything application/request specific:

Configures middlewares - that's important, because they provide such things as user object in request:

django.middleware.common.CommonMiddleware - doesn't seem to be used (docs)
django.middleware.http.ConditionalGetMiddleware - not sure why it is needed
codereview.middleware.AddUserToRequestMiddleware - this one also fetches user-specific parameters from Account record
codereview.middleware.PropagateExceptionMiddleware - logs and rewrites exceptions to be more user-friendly

Sets urls.py to be the ROOT_URLCONF - mapping between URLs and handler functions in views.py
Enables django.core.context_processors.request which adds `request` object to templates
Configures template loaders
Configures file uploads
Configures URL to generate path to static files as `/static/`
Rietveld own constants like incoming email address are also defined here

After all above is done, Django handler starts processing the request:

It looks into urls.py to find what function should process requested URL
urls.py is a redirect to codereview/urls.py with actual mapping, so it reads the latter as well
Finds associated function name and calls this function from views.py

And that's basically the entrypoint that you need to start hacking Rietveld/Django and AppEngine.