Some Thoughts on Writing Readable Python Code

The Python document “Style Guide for Python Code“, also known as PEP-8, starts off by stating that: “One of Guido‘s key insights is that code is read much more often than it is written.” While I won’t dispute that, I do have some thoughts on reading code in general. I’m a big fan of re-use, and I don’t like to spend any more time reading code than I absolutely have to. I want to get on with it and get the project done. But not everything comes with a set of nicely written man pages or a detailed reference manual, so I do find that I have to actually read the code every now and again. And while I may grumble about the time I spend doing it (and thinking about all the other people doing the same thing because someone couldn’t be bothered to write any documentation), what really makes me cranky is when the code is hard to read to begin with.

One of the claims made for Python is that it is inherently readable. I think I would disagree with this as a blanket statement. Python has some quirks that I’ve learned to live with, but in general it’s no better or worse than most other block-structured languages. However, I’m not a big fan of the mandatory indent, and I really don’t like the “dangling tails” on if, for, while and other control structures. I find that it is all too easy to write Python that is not easily readable without serious study, and the whole point of documenting code that has been tested and verified is to reduce the amount of time spent trying to decipher the obtuse cleverness of someone else. Some of us want to get the job done and get on to other things, and I, for one, am not interested in a constant game of “stump-the-band” or solving puzzles.

PEP-8 does list some good guidelines for coding in Python, such as putting imports at the start of the module rather than embedded throughout in a scatter-shot fashion. When I see that what it tells me is the the code was created on-the-fly with little thought given to design or implementation. In other words, it was hacked into existence. It also gives advice as to naming conventions for variables, classes, methods and functions. All good stuff, to be sure.

However, one thing that I find sorely lacking in the Python syntax is an end-of-block marker. This is where I have observed that much of the obfuscation creeps into badly written Python. Here’s an example of some dense code:

    if sort_opts['move_minor'] in ('Up', 'Down'):
        while(len(all_remaining_lumps)):
            greatest_remaining_centroid = all_remaining_lumps.items()[0][1]
            # first find highest lump
            for lump in all_remaining_lumps.iteritems():
                centroid = lump[1].get('centroid')
                x, y = centroid
                if y < greatest_remaining_centroid.get('centroid')[1]:
                    greatest_remaining_centroid = lump[1]
                # then find all lumps with a y value +/- its radius
                current_row = []
                # loop_copy is made because we might remove items from the original dict, but you cannot
                # iterate of a dict which is changing in size
                loop_copy = all_remaining_lumps.copy()
                for lump in loop_copy.iteritems():
                    centroid = lump[1].get('centroid')
                    x, y = centroid
                    if y <= greatest_remaining_centroid.get('centroid')[1] + greatest_remaining_centroid.get('radius'):
                        current_row.append(lump)
                        all_remaining_lumps.pop(lump[0])
                # now, order the row
                if sort_opts['move_major'] == 'Right':
                    current_row.sort(compare_x_positions)
                else:
                    current_row.sort(compare_x_positions, reverse=True)
                temp_list.append(current_row)
        if sort_opts['move_minor'] == 'Up':
            temp_list.reverse()

and here it is again with comments used as end-of-block markers and some judicious blank lines:

    if sort_opts['move_minor'] in ('Up', 'Down'):
        while(len(all_remaining_lumps)):
            greatest_remaining_centroid = all_remaining_lumps.items()[0][1]

            # first find highest lump
            for lump in all_remaining_lumps.iteritems():
                centroid = lump[1].get('centroid')
                x, y = centroid
                if y < greatest_remaining_centroid.get('centroid')[1]:
                    greatest_remaining_centroid = lump[1]
                #endif

                # then find all lumps with a y value +/- its radius
                current_row = []

                # loop_copy is made because we might remove items from the
                # original dict, but you cannot iterate of a dict which is
                # changing in size
                loop_copy = all_remaining_lumps.copy()
                for lump in loop_copy.iteritems():
                    centroid = lump[1].get('centroid')
                    x, y = centroid
                    if y <= greatest_remaining_centroid.get('centroid')[1] + greatest_remaining_centroid.get('radius'):
                        current_row.append(lump)
                        all_remaining_lumps.pop(lump[0])
                    #endif
                #endfor

                # now, order the row
                if sort_opts['move_major'] == 'Right':
                    current_row.sort(compare_x_positions)
                else:
                    current_row.sort(compare_x_positions, reverse=True)
                #endif

                temp_list.append(current_row)
            #endfor
        #endwhile

        if sort_opts['move_minor'] == 'Up':
            temp_list.reverse()
        #endif
    #endif

Which version is easier to grasp right off by just looking at it? For me it’s the second one. The internal structure is made clear and explicit through the use of the #endif, #endfor and #endwhile markers. Blank lines help separate the functional blocks within the code. The original version, with its cramped lines and too-long comments requires more effort to decipher, and there is also a higher probability that the person reading it will miss something essential without some careful study. It should be noted that this is definitely not the worst example I could have come up with, it just happened to be handy. It’s not my code and I didn’t change any of the spelling or grammer in the comments. As an exercise you might want to check and see how many of the PEP-8 suggestions it violates.

The original version probably made perfect sense to the person who wrote it, but what about someone else who might need to read it later on? Programming should not be a selfish exercise or an activity carried out alone in a dark room. If the code is difficult to understand and poorly documented (if at all), then the odds that it will be reused start to drop rapidly towards zero. It is difficult to extend trust to something that looks like a junior year independent study project, no matter how clever or useful it claims to be. This has been cited numerous times in the literature as one of the primary stumbling blocks to reusability in the software industry.

Now, I’m not advocating a change to Python to incorporate end-of-block markers, but I do use them myself. They help make my code more readable, and hence maintainable, and other folks have expressed an appreciation for them as well. Just following PEP-8 will go a long ways toward making Python source more readable. The use of docstrings (as described in PEP-257) is also important.

The bottom line here is that badly written code ends up costing an organization far more in the maintenance and support phases than it would have cost to do it right from the outset. It costs more when a programmer decides that something needs to be rewritten rather than try to unscramble and reuse something that is already there. And, lastly, money and time are wasted when a programmer has to work through a piece of code to understand it and try to use it, rather than being able to refer to clear and concise documentation for understanding and test verification results for confidence.

At some point the unnecessary costs exceed the profits, and it’s all downhill from there.

UPDATE 16 Aug 2009 – Seems that I’m not alone in regards to my wariness of Python’s ability to lead a programmer down the seductive path of obfuscation. Philip Guo from Stanford has some thoughts on this as well: http://www.stanford.edu/~pgbovine/python-unreadable.htm

Also, be sure to check out this earlier posting: Why documenting your python code is a good idea

Advertisements

0 Responses to “Some Thoughts on Writing Readable Python Code”



  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow Crankycode on WordPress.com

Little Buddy

An awesome little friend

Jordi the Sheltie passed away in 2008 at the ripe old age of 14. He was the most awesome dog I've ever known.


%d bloggers like this: