Why Documenting Your Python Code is a Good Idea

In Python one can return more than just a simple type. It is possible to return a tuple, a list or a dictionary object, in addition to the simple types (int, string, float, etc.). Granted, what Python does when returning a complex data object is akin to returning a pointer to a structure in C, since everything is Python is an object and assignment creates references to objects. But there’s one big (nay, huge) difference between what one sees in C source and what one sees in Python.

In C a structure is predefined somewhere in the upstream source code. It is readily visible and, if commented judiciously, easily understood. In Python the closest equivalent data object is the dictionary, and it can be created dynamically at any point in the code. But the worst part is that the Python data object is opaque. The developer might surmise that a function or method (or external C/C++ extension) is returning a dict object, but unless someone bothered to document what it contains, or the developer wants to spend the time reading through the code to decipher it, the dict object is just a mysterious black box.

Lack of appropriate documentation in the code itself or in an external design document is not just sloppy, it’s rude and thoughtless. It forces someone else later on to spend valuable time trying to figure out how something is structured so they can, possibly, re-use it. In a project with 100K+ SLOC of Python, this can be a considerable drain on resources and a completely avoidable waste of time (and, hence, money).

Every project needs a data dictionary of some type. Given that many programmers have an almost pathological aversion to writing anything but code, this could be something as simple as a comment block that can be easily extracted using grep. Or one might use a tool like Epydoc to create a nicely formatted set of HTML pages to show off the elegance of your programming handiwork.

For example, it’s easy to document a function or method using Epydoc. Assume we have a function in some module called Histogram. The docstring header with appropriate mark-up for Epydoc would look like this:

""" Generates a histogram and associated data from a 2-D array.

    Extracts a histogram from a 2-D source array and returns a list of the
    bin counts.

    @param data_arry: Numpy 2-D array of integer values
    @type  data_arry: Numpy array object

    @param x_size:    X axis sixe
    @type  x_size:    int

    @param y_size:    Y axis size
    @type  y_size:    int

    @param num_bins:  Number of bins in histogram
    @type  num_bins:  int

    @param min_data:  Maximum value to bin
    @type  min_data:  int

    @param max_data:  Minimum value to bin
    @type  max_data:  int

    @return:    dictionary object::
                hist_data   int     List of num_bins integer values
                hist_avg    int     Mean of data
                hist_stdev  int     Standard deviation
                hist_max    int     Maximum value binned
                hist_min    int     Minimum value binned

    If input data is invalid the function returns None.

Yes, yes; I can hear the complaining now: “But that’s a lot of typing!” Then consider this: If one spends 5 minutes or so documenting a function at the time it is created, how much time would one spend trying to decipher the same function if it was not documented? More than 5 minutes, I’m sure. It’s also easy to update the function documentation whenever the function is modified, since it’s right there in plain sight. In fact, there is no excuse not to keep it in sync with the code.

To document a data object for later extraction with grep (or something like it), one could do something like this:

#- -----------------------------------------------------------------------------
#- Data Object
#- SomeModule: HistData
#- Dictionary type:
#- hist_data   int     List of num_bins integer values
#- hist_avg    int     Mean of data
#- hist_stdev  int     Standard deviation
#- hist_max    int     Maximum value binned
#- hist_min    int     Minimum value binned
#- -----------------------------------------------------------------------------

The “#-” prefix on each line makes it easy for grep to find, and the result can be redirected to a file for later reference. The “SomeModule: HistData” line is useful for determining where the data object is located. Using yet other tools (such as a shell script or even Python) one could easily, and automagically, convert the output file into a nicely formatted document or even an HTML page.

For the documentation-averse the same logic applies: Spend 5 minutes now or lots of minutes later. Which makes more sense when you’re under the gun and need to get something done now, and get it done right?

If Epydoc’s basic markup language isn’t good enough, then one can always use reStructuredText. However, I don’t think either stock Epydoc or reStructuredText support embedded data object definitions.


1 Response to “Why Documenting Your Python Code is a Good Idea”

  1. 1 Some Thoughts on Writing Readable Python Code « Crankycode Trackback on August 16, 2009 at 1:55 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow Crankycode on WordPress.com

Little Buddy

An awesome little friend

Jordi the Sheltie passed away in 2008 at the ripe old age of 14. He was the most awesome dog I've ever known.

%d bloggers like this: