Saturday, April 20, 2013

Program config as a DNA strand

This is a technical followup to the post about mind-altering programming languages, which concentrates, iterates and extends on the abstract DNA part.

Do you know how DNA looks like from the point of view of Python programmer?
DNA is a list. Of genes, and fillers in between.

    [ GENE, GENE, GENE, None, GENE, STUFF, GENE, .... ]

By the analogy with software, one GENE is one option. Here None or STUFF is something that is not recognized and not used in the process right now, but may be used later even in a different place for a different purpose. This leads to very interesting properties of configuration described as a DNA strand.

Property One: Everything is Optional and Configurable

Usually it is a tough choice what to include into your program config, because an option a day keeps users away. But with DNA config it doesn't matter - you can encode everything - every bit of flexibility you wanted, every feature-creeper, forward-thinking matter. Don't bother yourself with restrictions - include everything.

Property Two: Indefinite Logical Groups and Namespaces

To make sense out of tons of information in a DNA strand, you need to concentrate only on the parts that are relevant to you. This is done by "masking" (or marking) parts of the DNA code according to your current task. When you apply a mask, all GENEs of DNA that are not used are cleared. You can have multiple masks for different programs. An awesome feature-creeper would be to figure out mask automatically during the program run (program explicitly marks sequences what it reads and what it skips).

Property Three: Non-linear Option Identification

Because DNA is a list, you may think that you need to know option position to access it. This is not necessary. Index in a DNA is the most straightforward way to lookup an option value, and mask is a tool to help explore and identify its position. But the true power comes from the fact that a option can be identified by just analyzing content. Option can be a single GENE in specific format, or it can be identified as a sequence of GENEs that pass some validation logic. This makes is possible to write generic configuration analyzers, which can read these sequences to detect configuration problems, patterns and meaningful values to produce nice visualization out of them.

Property Four: Application Types

In real world DNA is packed in chromosomes. Humans have 46. Potatoes possess 48. Zombies are somewhere in between. In a virtual world of coding it makes sense to use one chromosome per application type. For example, every Python web framework has a lot of common parts, common features and minor details in these features, so a variety of such frameworks can be encoded with a single DNA strand.

Exercise

It may sound too abstract, but try to think how to build a shared DNA strand for two or more mini web application frameworks in Python. Just remember that DNA code includes both hardcoded application features and modifiable configuration options. And as usual look at BoxCar 2D for an inspiration.

Friday, March 15, 2013

Programming languages that alter your mind

Today I went to Perl 6 site.
I left.
I can remember in details what brought me there.
I still shocked by the new world that opened before me.
Butterflies..

---


NOTE: This is a not very positive post `about life`. Don't waste you time if you don't like such stuff.

The text below contains depictions of deaths by coding, too much letters, biased speculations, insight into psychedelic culture and a long way to the conclusions you're may not be ready to face with. You've been warned.


---

Let's face it - programming languages alter your mind. They affect the way people think and solve problems. It creates problems in life and relationships. Languages are addictive, they provide a lot of fun to entertain one's brain well beyond solving simple crosswords and moving sprites of monsters on the screen through the means of other sprites guided by the mouse-type manipulator. And while programming languages are fun, they are also harmful and even toxic.

A language is toxic when it doesn't match your expectations, such as were mine that resulted in filling Python bug #17426 (which I called the subzero wart, but that doesn't really matter). A language is harmful when it requires you to remember yet another thing from its "funny" behavior, and for that purpose you need to sacrifice a mem cell that was dedicated to store the name of your girlfriend's favorite flowers. It is harmful if it affects your mood, and you get butthurt when you read bad things about it.

---

Programming can be fatal for a girlfriend, relatives and the person in general. Much like eating mushrooms. I was only at two funerals of people below the age of 25. Both of them died of pneumonia, because both were not attentive enough to the symptoms of the cold they got, and that's because their attention was completely absorbed by tasks they needed to complete for their jobs. Their body failed, because their brain was unable to dedicated resources to timely response to the alerts of life sustaining system.

That's why people quit programming. That's why salaries are so high. But that doesn't mean other jobs are not harmful either. Low salary can be much more harmful in many ways leading to low self esteem and to even more complicated problems with relatives, friends and society. Did I say mushrooms can do this too?

---

Programming can be fatal even for the people with positive strong attitudes to pursue their own goals in life. If taken without prescription or without negative sides of the daily jobs in bad companies, programming becomes addictive - fun for one who practices and a loss for the human society. Ignoring everything around, being busy with developing your own _____ (framework, game, startup, social network, ...). A loss that happens when a person is leaves the job to pursue its virtual dreams about programming. Dreams disconnected from reality. And reality is something that you've completely ignored and skipped while being buried under the pile of daily tasks at your big corporation. Did I say that mushrooms can cause this too?

---

Real life is tough. It requires interaction with people, responding to signals and basically you start to forget your ideas, dreams and become more far away from your goals. New ideas are less likely to visit you, because maintaining a business requires a mindset of a library with shallow functions, not a scientific library with modelling algorithms. Maintaining a business is reinventing the bicycle. Over and over, day after day. Pursuing your dreams is reinventing the bicycle. Over and over, day after day. That keeps people busy, and the progress slow.

Programming languages can make the world less harmful and more close to the reality. Programming languages that are accessible, that save time and don't have a tough legacy. Enabling people to adapt them, discuss them and invent their own. If people can not invent ideal programming languages by themselves, by calculating, by using maths and inductive reasoning - they should try genetic algorithms. Did I mention that mushrooms can mutate?

---


The more experience I gain, the less I want to learn the peculiarities and differences between the languages. I just want them to work and I'd gladly outsource the task of inventing such working language to genetic algorithm. Algorithm where I involved as a social actor that makes selection based on shared experience with other actors. Genetic algorithm is where you have a set of parameters written into one long line. Mutation is a change of some or all of these parameters - sometimes completely random, sometimes guided. If your programming language has parameters - syntax, functions, modules, features - you can describe it using this long line called DNA.

To apply genetic algorithm you need a selection method - this is a shared user experience. You need an identification for the language a DNA sequence, but also a registry of names for and versions for human consumption. You need to abandon backward compatibility excuses and start building reusable components that will become new parameters in the language DNA sequence.

---

It starts to get late. Tomorrow there will be a new day, new karma and new ideas, so I need to finish this today. If programming languages can alter you mind, you can use this mind to alter the language. It is well known that when you look into the abyss, the abyss also gazes into you, but it is less known that it is an iterative process. Did I mention that shrooms can give you powers?..

Let me say goodbye and wish you all the best, dear reader. I hope you had a good trip, and now enjoy the expanded state of your mind and consciousness. Did I mention mushrooms? Right.

Wednesday, February 27, 2013

Formatting API Anti-Pattern

I was meditating over subset of core Python API that's dedicated to self-inspection in a running script. This API consists of inspecttraceback + sys.exc_*  + magical locals() and ideally should give you a total understanding of Python state at a given point in your program. But that doesn't happen, because things are complicated, and here is why..

I wanted to get a chain of callers for the function I was debugging. I don't like pdb at all - it is so outdated for 2013 that deserves a separate GSoC project.  I prefer to just insert debug statements into the code and after several weeks and attempts I came up with a helper to get the name of the caller that I used to log and debug such complex projects as SCons and Spyder. I am not as smart as everybody else, but even I didn't expect to spend weeks researching this basic stuff.

I think the reason historically is that people tried to produce the meaningful data using functional approach without taking due attention on the data quality itself. Is the data completesufficient and accessible? You can not say that from the manual. The manual (see inspect) encourages you to hack your way in the debris of various bits of data that elusively fail to create a complete picture - the puzzle just doesn't solve, but you can't see this, because you don't know what is the structure of the Python state at a given point in your program and what to look for. The API encourages you to find functions that join and analyse the bits, you fail and try again and in the end you come up with a function filled with uncertainties, that just does some formatting magic.

"Formatting API Anti-Pattern" may not be the best term for that, but it is the API (data structures, functions and other stuff) so complex and incomplete that you can not see this, and instead of that forced to write formatting functions just to make sense of this incomplete data. We are so used to the fact that "API provides abilities" that completely miss the point that it also "provides limitation", and the data in API can be incomplete, insufficient (incomplete for certain tasks) and not accessible (code becomes very complicated to handle).

How to fight this? Think about data first. What data should be available for inspection at a given point in your program when this program is paused? Can you see this data in the manual? If not, then you're dealing with "Formatting API".

Sunday, February 17, 2013

Ghosts in the shell

At the beginning of the era human controllers were dominating in the shell. Now only their ghosts are controlling processes, launching programs and executing tasks server side. Quite often without any shells at all.

Updated the CHAOS speccy.

Friday, December 14, 2012

Using getopt with optparse (or how to move from getopt gradually)

TL;DR: https://bitbucket.org/techtonik/scons/commits/bcb60b

SCons has a very old and interesting codebase with a lots of outdated and unusual stuff that makes it more difficult to extend. One such thing is getopt library, which is a predessor for Optik library (written by Greg Ward) now better known as optparse.

So I wanted to replace getopt with optparse, but didn't want to change everything in one step, because I didn't have time to check every option. Instead I decided to parse options I needed with optparse and leave everything else to the old getopt engine.

getopt only needs a list of arguments to work. sys.argv[1:] to be exact. This is also the second half of result returned by OptionParser.parse_args() function. The only problem was to teach OptionParser to ignore unknown options and leave them in arguments. Strange thing, but Optik examples included this user story, completely ignored in optparse documentation. To make this long user story short, you need to subclass OptionParser to use getopt with optparse:
# "Pass-through" option parsing -- an OptionParser that ignores
# unknown options and lets them pile up in the leftover argument
# list.  Useful to gradually port getopt to optparse.

from optparse import OptionParser, BadOptionError

class PassThroughOptionParser(OptionParser):
    def _process_long_opt(self, rargs, values):
        try:
            OptionParser._process_long_opt(self, rargs, values)
        except BadOptionError, err:
            self.largs.append(err.opt_str)
    def _process_short_opts(self, rargs, values):
        try:
            OptionParser._process_short_opts(self, rargs, values)
        except BadOptionError, err:
            self.largs.append(err.opt_str)

parser = PassThroughOptionParser(add_help_option=False)
parser.add_option('-a', '--all', action='store_true',
                      help="Run all tests.")
(options, args) = parser.parse_args()

#print "options:", options
#print "args:", args
Now pass args down to the getopt call and you're all set.

P.S. In argparse you can use ArgumentParser.parse_known_args() function.

Update 2013-02: For humane option parsing you should definitely see docopt library.

Wednesday, December 05, 2012

Good reference on Python magic methods

I've just stumbled upon this manual about Python magic methods and it's really amazing. Definitely a good refresher and highly recommended.

http://www.rafekettler.com/magicmethods.html


/me wonders if the same engineering technique can be applied to official documentation corpus..

Tuesday, November 20, 2012

Cinematic journey approach for Python development

Quotes page (fixed in stone) is silent about the one who said that Python, compared to other languages, allows to directly put thoughts into the code. I couldn't disagree with this, but taking idealistic approach, this was more true with Python 2 when coding on a system level, but not so true with the great coming of the web and i18n. So, what's wrong now? I don't have a clean and up to the point answer, because many people still think that there is nothing wrong with the Python. Probably the right question is: why Python is not better than it is now?

This one of the complicated questions nobody is able to answer fully. 42 is the answer, but does the question clear enough? The question is probably too complex for a good technical answer and should undergo decomposition. The decomposition can be achieved by clarifying. What means "better"? More easy to code in. Why is it hard to code? Here goes a list of problems...

...

Well, there is no list. Therefore there is no visibility, and without visibility no answer is possible. Gain visibility into the list of problems that make Python not-as-good as we want it to be is the primary step to take to make all subsequent steps reasonably grounded for a good party quest (and sane development roadmap for community to focus on).

Historically there were several driving forces behind Python development - mailing lists, bug reports and PEPs. PEPs more than the bugs. Mailing lists somewhere in between (YMMV).

ML were good until people had a lot of time to follow up. Bugs are good at tracking status of things, but they are tuned for fixing things and scratching issues, so language research naturally falls out of context in bug tracker interface. PEPs.

PEP is a good thing that helped to free Python core from featurecreep damage, provided a basis for discussions over a long period of time and insight into decisions over the language development. But PEPs start to fail, and the reason why they do this is the lack of time and energy to iterate over them. Most people can't say if technology is good or bad before testing it (version control as an example), and PEPs with lengthy pieces of design detail assume prior experience with the problem, require thorough imagination to see if the solution will play well.

PEPs require a lot of concentration - the resource of a big shortage nowadays, especially of professional grade. Which is not a surprise if you look at how good HR and management technologies are developed in modern world to keep people busy and involved. We can only hope that collective minds of big corp.s are somehow bugged with the problem and look for solutions to divert their resource flow to improve the grounds they are standing on. Let's hope that community can back up their support, and also somehow bugged with the problem about how to lower barriers of requirements, responsibility, experience and technical expertise for occasional community member, a student or elderly accountant, to be useful in Python development process. Lets's hope that both parties are interested enough to constantly improve ways to use the resource flow to the fullest extent possible.

There are two things that can be help here (and make Python better that it is now) - first one is to improve visibility. It takes its roots in cinematic industry and it's called scenario. Second one is to improve the process and it is a best practice developed over the time by user experience professionals. This one named customer journey map.


What keeps me away from putting my thoughts into code when I write Python?

"""Python forces me to maintain a lowest level structure of my writing - the indented layout, a good thing. Although this also comes with a pain while debugging, because Gangam style multiline comments require me to remember to indent them as well."""  - this is a scenario. You can add various metrics to it, such as:

  """I have only 7 operational attention slots in my mind, and one constantly falls out, because I have to pay attention to complicated commenting requirement.""" - the metrics here directly influences how deep one person can operate at any given moment. It is basically that multiline comments with strings are stealing concentration.

  """Those indentation errors are driving me mad every time I forget to indent multiline comment for debugging.""" - this says that a person uses iterative approach to debug problems, often commenting a lot, and probably in production environment using non-tuned editor. That's another scenario where Python comment hack doesn't play well.

Scenarios have two good qualities - they are short and can be conflicting between each other. PEP is on the other side - it is self-sufficient. To notice that PEP is contradictory - you need to attentively and thoroughly read it or write it yourself. It takes a lot of time. Scenarios are somewhat emotional, they are easy to remember and refer to. This makes it possible to concentrate on conflicting scenarios, outline conflicting points and concentrate all work around them rather than around vague opinions, which makes the whole process of looking for compromises (or good solutions) more fun and involving.


To summarize, the scenario is a good title to remember and a short story to tell. What is the difference between scenario and a StackOverflow question? Question may not have a story, scenario may not contain questions. What's the difference between scenario, use case and user story? "Use case" is an enterprise slang, "user story" is an agile term. Both may have some definitions. Scenario is just scenario, like in movie. You should replay it to see how it works. Scenario is for humans, it is less formalized and comes with emotions included (YMMV).



Let's skip to another example of problem with Python usability on a higher level - packaging - and present another tool from usability domain that can help with analyzing processes in general.


What's wrong with Python packaging that everybody constantly rewrites it?


I didn't intend to include it here first, but a half an hour ago I spotted this article - http://lucumr.pocoo.org/2012/6/22/hate-hate-hate-everywhere/ If distutils/setuptools had a scenario database for packaging, it could be possible to analyze limitations of Python in regard to each scenario. This analysis is similar to PEP, but not necessary a proposal and not necessary so extensive. Scenario may contain a history of the problem, a short description, summary and link to other conflicting scenarios. The role of scenario database is to aid decision making process and an easy reference for new people facing the same problems.

Scenarios can be universal and it is a good analysis tool. You can substitute Ruby for Python and look how good this specific workflow looks for the different system.

"""I can't list installed Python packages, why?""" - does anybody have a link? """I can't find the answer""", and that's another scenario about usefulness of scenarios.

So, to fix packaging there should be a way to operate with scenarios. There should be at least a list of scenarios (or better indented tree), so that (y-)hackers of a new packaging tool could go over it, think about their approach, tick checkboxes and hopefully, spot and bring to the surface this "Essential Packaging Restraint" that eats a whole generations of people. The point is to spot the problem before starting to code.


The scenario DB will help, but there is another usability tool that can make packaging, bug tracking and other development processes more streamlined (less time consuming, more fun and engaging). This tool is called Customer Journey Map, and it shows to people, who are not experiencing any problems with the process, where those problems are for somebody else. This map is also a good starting point in web site redesigns, conference organizations, all kinds of activities that involve people, or more specific, a single person named "Customer", barriers this guy is facing and steps to remove these barriers.

I can't extend to a great detail about CJM in this post due to time constraints. I was impressed by a presentation of awesome UXpresso team, there might be a video available, but it is likely Russian only, and I've heard of at least one major Python company (wargaming.net) that uses it extensively, so I can only give you a pointer for now. It will be interesting to make presentation of this technology for Python contribution process and talk about CJM at PyCon, but I am unlikely to afford the participation costs, so somebody else should do this.