another day another vice another roll of the dice: 2013

Thursday, November 28, 2013

Roundup Tracker: Create Issues by Email

There is one thing about bugs.python.org and other Roundup issue tracker instances that is not widely known. It is the fact that you can create new issues and update old ones directly from your email client, without visiting web interface at all.

As much as I hate Debian's email-only tracker, I must admit that having email control feature in addition to web interface can save some time, especially if you constantly forget passwords for different trackers like me.

So, to create new issue, just send email to the address that tracker uses to send mail to you. Well-known addresses of Python trackers:

report@bugs.python.org - for filling bugs in Python with b.p.o tracker
metatracker@psf.upfronthosting.co.za - for reporting problems with b.p.o tracker itself

Note that your email needs to be present in tracker database for it to accept your request, so you might need to create your account first.

You can also update existing issues by adding suffixes like [status=closed;resolution=invalid] to the subject field of your replies. I just closed issue19825 to test this method. You can try it too next time you feel uncomfortable about escaping from your mailbox.

This stuff is actually documented in official Roundup docs, but who reads the docs, anyway.

Thursday, November 21, 2013

Mercurial UX: Undo/Redo Wanted

This is an adapted mail for Mercurial mailing list, which is good to have as a blog post for reference.

For those of you who was born in Github era, Mercurial is an alternative version control system with transparent, pythonic internals. Because all my projects are escaping to Github I took a chance to reiterate over my knowledge of HG and see what I missed over the years of using it. This is just one idea.

This year, Mercurial introduced new ChangesetEvolution concept, which allows to safely mess with repository history. I decided to take a look and started with 'hg fold' command. Quite soon I got into a usual state of missed RTFM evening (you know the evening when you have a plenty of time to read a book with a cup of coffee). I couldn't understand what happened, but I clearly knew it is not something I want, so I wanted to get my repo back into the initial state. There are a lot of commands like 'rollback', 'revert', 'update -C', 'backout', 'strip' to revert the state after some command, but the real problem is to choose the right one. So I thought that it is something that is missing.

In Mercurial (and in other version control systems as well) - there is no concept of "operational transaction". In databases no matter what you do, if transaction is not committed, the state is reverted. These are called atomic transactions. Before Subversion there was CVS with non-atomic commits - if there was an error with some file (merge or something) - you got half of files committed and half not. Awful, right? After SVN all commits are atomic - if something is wrong, nothing is committed. Atomicity is important for user operations too. If something goes wrong - I want to get back from where I started. In Mercurial it works by making a backup copy of your repo. I guess for Git it's the same.

So, no obvious command to revert the last operation, no atomicity on operation level. This makes me feel unsafe and unsure about what can I do in my clone if I am too lazy to make a copy. And I thought that the next step in Mercurial evolution would be going from "user command" to "user operation" concept.

"user command" is a command like `hg inc` that users type in command line. It can affect the state of repository or not.

"user operation" - is a command or commands that change the state of repository. The "user operation" has a property of being "revertible" or "not". Granularity of changes to repository (how many commands is one operation) is decided using the high level user level goal to undo and redo these operations. For example 'hg fold' is an command that can be undone. It is a separate "user operation" and an entry in "undo history".

"user command" that modifies state may have "reverse command" that brings the changes back to the initial state. But maintaining this on command level is too fragile and hard to remember "commit/rollback". "user operation" may not have a "reverse command" - it may just be reverted without dedicated reverse command (like when you replace clone with your backup copy). And for that you need "undo history".

"undo history" is a stack of "user operations". These can be revertible or not - it depends on the logic. And it is not a commit log - it is operations log. The direct analogy is GIMP undo history dialog.

Now that the concept of the feature wanted is clear, some blueprints for the starter.

From the usability POV, a mercurial operations history dialog is a list, where each entry contains:

- operation name

- if it can be undone

- if not, state the reason

the reason is necessary to understand either:

1. current condition of repository

- what should be adjusted to enable undo

- why adjustment can not be automated

2. what should be written in hg itself to make it possible

- pointer to dev docs and status page

Summary:
* user command ('hg inc', 'hg ci', ...)
* user operation (hg command that changes state)
* undo history (stack of latest user operations)
* undo history items are frozen if reverting is impossible
* undo history is local
* state explanation between operations

Links:
https://www.google.by/search?q=undo+pattern - command and memento patterns can help
https://bitbucket.org/hstuart/hg-multiundo - some work on the topic was done by Henrik Stuart

The final test:

hg undo

hg redo

hg undo --list

If you have what to say, but are not subscribed to continue thread in official mercurial@selenic.com mailing list, then I guess it's safe to leave comments here.

Saturday, July 20, 2013

Command to generate SSH key on Linux

Because ssh-keygen has human unfriendly command line interface without --help option, here is a quick reminder how to generate very secure SSH key valid for one year (52 weeks) with "13.04" comment for its public part:

ssh-keygen -t rsa -b 4096 -V +52w -C 13.04

This key can be used to avoid typing password when pushing your Python code to sites like Bitbucket or GitHub (Google Code is not there yet) or to securely upload your packages to PyPI.

Tuesday, July 02, 2013

Code Review with Rietveld and Mercurial Queues

UPD 2014-07: On Windows, thanks to bug in subprocess that doesn't escape ^ character (caret) you need to escape it manually in all commands, like this ^^.

Teal Deer

    $ hg qser -v
    0 A supported-vcs
    1 U noul

    $ python upload.py --rev "'supported-vcs'^1:'supported-vcs'"

Also FAQ.

Rietveld, pre-commit and post-commit reviews

You probably know what Rietveld is - it allows you to send uncommitted changes for review with upload.py script that you grab from review server:

    $ python upload.py
    Upload server: codereview.appspot.com (change with -s/--server)
    New issue subject: Test
    Issue created. URL: http://codereview.appspot.com/10864043
    Uploading base file for README

This is a pre-commit review, where you discuss changes before they are committed, and add a link to passed review into commit message. In post-commit reviews you usually comment on existing revisions in external project history browser service, such as Google Code or GitHub. I you need a thorough examination of every change, then post-commit review process can become a challenge of catching a running train. If you just need to skim over the committed code and express doubts if anything is unclear for further discussion, the post-commit is ok. In fact we use this way for Spyder IDE development.

Post review is nice in the sense that development is not stopped while you're offline for a vacation, or your braincells are overwhelmed with more time pressing matters. Pre review is good if you want to ensure maximum stability for your system. Ideal development process should have convenient means to support both simultaneously.

--rev hack

But back to the Rietveld. Little is know that it supports reviewing existing changesets and revision ranges with the --rev argument of upload.py For example:

    $ python upload.py --rev "2869^1:2869"
    Upload server: codereview.appspot.com (change with -s/--server)
    New issue subject: Spyder IDE changeset 2869
    Issue created. URL: http://codereview.appspot.com/10866043
    Uploading base file for spyderlib/plugins/configdialog.py
    Uploading base file for spyderlib/utils/external/lockfile.py

This will upload revision 2869 (5a3d6821eabe) from Spyder repository for review. Why --rev is a hack? As you may see there is no original commit message, no revision number, no parent revision info. Actually, Rietveld doesn't know anything about that. What is gets from upload.py is a patchset as a plain SVN diff. Data loss could be prevented with extensible changeset format, which will greatly improve tool interoperability, but I don't know any entity that could support the time required for that development.

The --rev is a hack not only because of information loss, but also because it serves double purpose. The purpose is either:

--rev REV specify the base revision to compare current change to
--rev REV1:REV2 specify revision range to create the diff

To implement it the proper way, the upload.py needs -c CSET, --changeset CSET argument that will extract revision diff from the current version control history, propose to supply commit message as an issue description, save parent revision and changeset hash info and maybe cache the issue number for that changeset locally.

Uploading stuff from Mercurial Queue

Mercurial Queue is just a bunch of patch files in .hg/patches/ directory that can be applied or reverted in order specified by series file. This may change in future, but everybody knows and uses this fact. There is also a less obvious fact (that took long time for me to discover) that every applied patch is also a full fledged Mercurial revision. While it can be applied/reverted with qpush/qpop, as long as it applied, all other operations are working on it as well. In addition, every such revision gets a tag that is essentially this patch name.

So, to upload a patch from Mercurial Queue, make sure it is applied:

    $ hg qser -v
    0 A supported-vcs
    1 U noul

And use its name as --rev range argument:

    $ python upload.py --rev "'supported-vcs'^1:'supported-vcs'"

Make sure it is escaped as shown, because otherwise Mercurial will read is as a subtraction of tag named vcs from tag supported.

With Mercurial Queues it is also much-much easier to send patch updates. For that make sure your patch is active:

    $ hg qseries -v
    0 U supported-vcs
    1 U noul

It is not, so apply it:

    $ hg qpush 
    applying supported-vcs
    now at: supported-vcs

Do your modifications according to review comments and refresh the patch:

    $ hg qrefresh

Remember Rietveld issue number and use upload.py to update existing issue:

    $ python upload.py -i 10822044 --rev "'supported-vcs'^1:'supported-vcs'"

Stuff ToDo and Bitcoin Extortion

As I said Mercurial Queue integration can be improved. It is the 9th most requested feature in Rietveld at the moment and at least for me there is a clear reason why. The workflow for upload.py should look like the following:

(no arguments specified)

check that current VCS is Mercurial
check that hg diff output is empty
check that there is MQ patch applied
check if there is Rietveld issue number for the patch
(needs local storage for this number)
check if the patch is not already uploaded
(actual if issue number is found)
propose to upload a patch
propose to use commit message for patch description
(propose to open editor to edit description)
submit

This is just an entrypoint for further enhancement to reach the ideal workflow, which will require more fixes not only in upload.py, but also on server side.

I am not sure I will be able to code this given the fact that I am to be sold into full time job slavery for my debts. That's why I leave this research here for further development. With that decomposition it should be easy to implement it and send a patch in exchange for some credits.

For those who'd appreciate the feature, but whose time/money ratio is leaning towards 0, there is a way to show the interest through a timeless fund (tracked here) with a total goal limit of 4 BTC for improved MQ support in Rietveld. I could name a few reasons why you should do this including the chance to trade money back for some life time in future, not only for yourself, but for the whole lot of open source projects that use Mercurial with Rietveld (such as core Python development). While all these are argument good, this is not a sponsored development, as I will likely to do this stuff sooner or later regardless of the goals met. I just need some Bitcoins to experiment with, and because I don't have anything else to trade, here is a good opportunity to give something back in exchange.

Saturday, April 20, 2013

Program config as a DNA strand

This is a technical followup to the post about mind-altering programming languages, which concentrates, iterates and extends on the abstract DNA part.

Do you know how DNA looks like from the point of view of Python programmer?
DNA is a list. Of genes, and fillers in between.

    [ GENE, GENE, GENE, None, GENE, STUFF, GENE, .... ]

By the analogy with software, one GENE is one option. Here None or STUFF is something that is not recognized and not used in the process right now, but may be used later even in a different place for a different purpose. This leads to very interesting properties of configuration described as a DNA strand.

Property One: Everything is Optional and Configurable

Usually it is a tough choice what to include into your program config, because an option a day keeps users away. But with DNA config it doesn't matter - you can encode everything - every bit of flexibility you wanted, every feature-creeper, forward-thinking matter. Don't bother yourself with restrictions - include everything.

Property Two: Indefinite Logical Groups and Namespaces

To make sense out of tons of information in a DNA strand, you need to concentrate only on the parts that are relevant to you. This is done by "masking" (or marking) parts of the DNA code according to your current task. When you apply a mask, all GENEs of DNA that are not used are cleared. You can have multiple masks for different programs. An awesome feature-creeper would be to figure out mask automatically during the program run (program explicitly marks sequences what it reads and what it skips).

Property Three: Non-linear Option Identification

Because DNA is a list, you may think that you need to know option position to access it. This is not necessary. Index in a DNA is the most straightforward way to lookup an option value, and mask is a tool to help explore and identify its position. But the true power comes from the fact that a option can be identified by just analyzing content. Option can be a single GENE in specific format, or it can be identified as a sequence of GENEs that pass some validation logic. This makes is possible to write generic configuration analyzers, which can read these sequences to detect configuration problems, patterns and meaningful values to produce nice visualization out of them.

Property Four: Application Types

In real world™ DNA is packed in chromosomes. Humans have 46. Potatoes possess 48. Zombies are somewhere in between. In a virtual world of coding it makes sense to use one chromosome per application type. For example, every Python web framework has a lot of common parts, common features and minor details in these features, so a variety of such frameworks can be encoded with a single DNA strand.

Exercise

It may sound too abstract, but try to think how to build a shared DNA strand for two or more mini web application frameworks in Python. Just remember that DNA code includes both hardcoded application features and modifiable configuration options. And as usual look at BoxCar 2D for an inspiration.

Friday, March 15, 2013

Programming languages that alter your mind

Today I went to Perl 6 site.
I left.
I can remember in details what brought me there.
I still shocked by the new world that opened before me.
Butterflies..

---

NOTE: This is a not very positive post `about life`. Don't waste you time if you don't like such stuff.

The text below contains depictions of deaths by coding, too much letters, biased speculations, insight into psychedelic culture and a long way to the conclusions you're may not be ready to face with. You've been warned.

---

Let's face it - programming languages alter your mind. They affect the way people think and solve problems. It creates problems in life and relationships. Languages are addictive, they provide a lot of fun to entertain one's brain well beyond solving simple crosswords and moving sprites of monsters on the screen through the means of other sprites guided by the mouse-type manipulator. And while programming languages are fun, they are also harmful and even toxic.

A language is toxic when it doesn't match your expectations, such as were mine that resulted in filling Python bug #17426 (which I called the subzero wart, but that doesn't really matter). A language is harmful when it requires you to remember yet another thing from its "funny" behavior, and for that purpose you need to sacrifice a mem cell that was dedicated to store the name of your girlfriend's favorite flowers. It is harmful if it affects your mood, and you get butthurt when you read bad things about it.

---

Programming can be fatal for a girlfriend, relatives and the person in general. Much like eating mushrooms. I was only at two funerals of people below the age of 25. Both of them died of pneumonia, because both were not attentive enough to the symptoms of the cold they got, and that's because their attention was completely absorbed by tasks they needed to complete for their jobs. Their body failed, because their brain was unable to dedicated resources to timely response to the alerts of life sustaining system.

That's why people quit programming. That's why salaries are so high. But that doesn't mean other jobs are not harmful either. Low salary can be much more harmful in many ways leading to low self esteem and to even more complicated problems with relatives, friends and society. Did I say mushrooms can do this too?

---

Programming can be fatal even for the people with positive strong attitudes to pursue their own goals in life. If taken without prescription or without negative sides of the daily jobs in bad companies, programming becomes addictive - fun for one who practices and a loss for the human society. Ignoring everything around, being busy with developing your own _____ (framework, game, startup, social network, ...). A loss that happens when a person is leaves the job to pursue its virtual dreams about programming. Dreams disconnected from reality. And reality is something that you've completely ignored and skipped while being buried under the pile of daily tasks at your big corporation. Did I say that mushrooms can cause this too?

---

Real life is tough. It requires interaction with people, responding to signals and basically you start to forget your ideas, dreams and become more far away from your goals. New ideas are less likely to visit you, because maintaining a business requires a mindset of a library with shallow functions, not a scientific library with modelling algorithms. Maintaining a business is reinventing the bicycle. Over and over, day after day. Pursuing your dreams is reinventing the bicycle. Over and over, day after day. That keeps people busy, and the progress slow.

Programming languages can make the world less harmful and more close to the reality. Programming languages that are accessible, that save time and don't have a tough legacy. Enabling people to adapt them, discuss them and invent their own. If people can not invent ideal programming languages by themselves, by calculating, by using maths and inductive reasoning - they should try genetic algorithms. Did I mention that mushrooms can mutate?

---

The more experience I gain, the less I want to learn the peculiarities and differences between the languages. I just want them to work and I'd gladly outsource the task of inventing such working language to genetic algorithm. Algorithm where I involved as a social actor that makes selection based on shared experience with other actors. Genetic algorithm is where you have a set of parameters written into one long line. Mutation is a change of some or all of these parameters - sometimes completely random, sometimes guided. If your programming language has parameters - syntax, functions, modules, features - you can describe it using this long line called DNA.

To apply genetic algorithm you need a selection method - this is a shared user experience. You need an identification for the language a DNA sequence, but also a registry of names for and versions for human consumption. You need to abandon backward compatibility excuses and start building reusable components that will become new parameters in the language DNA sequence.

---

It starts to get late. Tomorrow there will be a new day, new karma and new ideas, so I need to finish this today. If programming languages can alter you mind, you can use this mind to alter the language. It is well known that when you look into the abyss, the abyss also gazes into you, but it is less known that it is an iterative process. Did I mention that shrooms can give you powers?..

Let me say goodbye and wish you all the best, dear reader. I hope you had a good trip, and now enjoy the expanded state of your mind and consciousness. Did I mention mushrooms? Right.

Wednesday, February 27, 2013

Formatting API Anti-Pattern

I was meditating over subset of core Python API that's dedicated to self-inspection in a running script. This API consists of inspect + traceback + sys.exc_* + magical locals() and ideally should give you a total understanding of Python state at a given point in your program. But that doesn't happen, because things are complicated, and here is why..

I wanted to get a chain of callers for the function I was debugging. I don't like pdb at all - it is so outdated for 2013 that deserves a separate GSoC project. I prefer to just insert debug statements into the code and after several weeks and attempts I came up with a helper to get the name of the caller that I used to log and debug such complex projects as SCons and Spyder. I am not as smart as everybody else, but even I didn't expect to spend weeks researching this basic stuff.

I think the reason historically is that people tried to produce the meaningful data using functional approach without taking due attention on the data quality itself. Is the data complete, sufficient and accessible? You can not say that from the manual. The manual (see inspect) encourages you to hack your way in the debris of various bits of data that elusively fail to create a complete picture - the puzzle just doesn't solve, but you can't see this, because you don't know what is the structure of the Python state at a given point in your program and what to look for. The API encourages you to find functions that join and analyse the bits, you fail and try again and in the end you come up with a function filled with uncertainties, that just does some formatting magic.

"Formatting API Anti-Pattern" may not be the best term for that, but it is the API (data structures, functions and other stuff) so complex and incomplete that you can not see this, and instead of that forced to write formatting functions just to make sense of this incomplete data. We are so used to the fact that "API provides abilities" that completely miss the point that it also "provides limitation", and the data in API can be incomplete, insufficient (incomplete for certain tasks) and not accessible (code becomes very complicated to handle).

How to fight this? Think about data first. What data should be available for inspection at a given point in your program when this program is paused? Can you see this data in the manual? If not, then you're dealing with "Formatting API".

Sunday, February 17, 2013

Ghosts in the shell

At the beginning of the era human controllers were dominating in the shell. Now only their ghosts are controlling processes, launching programs and executing tasks server side. Quite often without any shells at all.

Updated the CHAOS speccy.