Autodocumentation with sphinx and git hooks

When you start programming, you never putting thought into how the program should run. Most people do neglect putting enough thought into the documentation, though. If you’re like me, you’ve experienced the difficulty of trying to understand and maintain software that you wrote yourself a while ago, or that you’ve received from somewhere else. It’s difficult. The best way to avoid having bad documentation or, worse yet, no documentation in your own software is to document with as little work as possible, using autodocumentation tools straight from the beginning. In this tutorial, I’ll discuss how to do this using an example setup with Sphinx and git.

The example I’m showing is the code from my package StereoVision. The documentation for this project is located here. It’s not especially complex, but every module, class, class member and function in the package is documented, and I was able to put it together without a lot of effort. I hope that you’ll be able to use the a similar pattern to document your own software.

Documenting the sources

One of the main problems with documentation is that it’s a pain to write code and then remember to document what you did somewhere else. Most of the time you don’t have time for it, and if you do have time there’s almost certainly something else you’d rather be doing. Autodocumentation tools are useful to solve this problem. They examine the comments in code you’ve written and extract those with special syntax in order to generate documentation. Change something in your code architecture? Just update the comments right next your code, recompile your documentation, and your documentation is up to date.

StereoVision is in Python, and I documented it using one of the most popular autodocumentation tools for Python, Sphinx. Sphinx is by no means the only autodocumentation tool out there, but in my opinion it’s one of the most elegant. It’s so good, in fact, that I recommend using it even if you don’t work with Python. There’s a great extension, Breathe, which lets you use autodocumentation generated with Doxygen, making it compatible with a plethora of other languages: Most notably C++, but also Fortran, C, Objective-C, Java, PHP and a whole bunch more. Other good autodocumentation tools are Natural Docs and Docco, to name just a very few.

We’ll focus in this tutorial on the examples in StereoVision, but I wanted to mention some of the other very cool projects out there.

So let’s get started. When you start to code, or you decide to add introspective documentation to your project, you just add comments appropriately. Here’s an example from blockmatchers.py from StereoVision:

class BlockMatcher(object):

    """
    Block matching algorithms.

    This abstract class exposes the interface for subclasses that wrap OpenCV's
    block matching algorithms. Doing so makes it possible to use them in the
    strategy pattern. In this library, that happens in ``CalibratedPair``, which
    uses a unified interface to interact with any kind of block matcher, and
    with ``BMTuners``, which can discover the ``BlockMatcher's`` parameters and
    allow the user to adjust them online.

    Each ``BlockMatcher`` protects its block matcher's parameters by using
    getters and setters. It exposes its settable parameter and their maximum
    values, if they exist, in the dictionary ``parameter_maxima``.

    ``load_settings``, ``save_settings`` and ``get_3d`` are implemented on
    ``BlockMatcher`` itself, as these are independent of the block matching
    algorithm. Subclasses are expected to implement ``_replace_bm`` and
    ``get_disparity``, as well as the getters and setters. They are also
    expected to call ``BlockMatcher``'s ``__init__`` after setting their own
    private variables.
    """

    #: Dictionary of parameter names associated with their maximum values
    parameter_maxima = {}

    def __init__(self, settings=None):
        """Set block matcher parameters and load from file if necessary."""
        #: Block matcher object used for computing point clouds
        self._block_matcher = None
        self._replace_bm()
        if settings:
            self.load_settings(settings)

Here’s an example of what Sphinx can generate from that.

What you see is just a normal class with a docstring. The docstring is written in ReST, which Sphinx parses to decide how to render it. You also see a class attribute, which I’ve documented with a comment that starts with “#:”. Sphinx recognizes that too. It also recognizes the docstrings in all of the methods that the class offers. Do that with all your class, functions, etc. and Sphinx will be able to generate documentation from it.

Documenting the package

The documentation you’ve generated so far could easily be used by just about any documentation generator. Now we’re moving on to things specific to Sphinx.

Sphinx gives you the ability to generate documentation in a tree-like structure. In fact, all documentation generated with Sphinx is supposed to have a tree-like structure, meaning that you can get to any document you generated from any other document if you know how to traverse the tree correctly. I like to take advantage of this to have my package documentation be aware of its individual modules, and have superpackages know their subpackages.

You can document a module like this (don’t worry, nothing surprising here):

'''
Wrapper classes for block matching algorithms.

Classes:

* ``BlockMatcher`` - Abstract class that implements interface for subclasses

* ``StereoBM`` - StereoBM block matching algorithm
* ``StereoSGBM`` - StereoSGBM block matching algorithm

.. image:: classes_blockmatchers.svg
'''

import cv2

Sphinx turns that into this.

This is just the docstring at the top of the module, but you can see something else in there that you wouldn’t normally find in a docstring: A Sphinx directive. This:

.. image:: classes_blockmatchers.svg

means that the documentation should include the specified image. The path to the image is relative to the root of your documentation path. We’ll get to that in a second.

There are other directives that you can use to do more complex things in your code as well. This is the docstring for the module __init__.py, which serves for the entire package:

"""
Utilities for 3d reconstruction using stereo cameras.

Modules:

* ``stereo_cameras`` - Camera interfaces
* ``calibration`` - Tools for calibrating stereo cameras
* ``blockmatchers`` - Blockmatching algorithm matchers
* ``point_cloud`` - Point clouds
* ``ui_utils`` - Utilities for user interaction
* ``exceptions`` - Various exceptions

Import structure:

.. image:: packages_StereoVision.svg
:width: 100%

Camera interfaces
*****************

.. automodule:: stereovision.stereo_cameras

Camera calibration
******************

.. automodule:: stereovision.calibration

Block matchers
**************

.. automodule:: stereovision.blockmatchers

Point clouds
************

.. automodule:: stereovision.point_cloud

User interface utilities
************************

.. automodule:: stereovision.ui_utils

Exceptions
**********

.. automodule:: stereovision.exceptions
"""

Sphinx does this with that.

In fact, this covers the entirety of the documentation for the project. Sphinx uses it as a point of entry to generate all of the source documentation. What’s going on here?

First, I start off with a normal string, followed by a list. Then I include an image showing a diagram of the import structure. The directive is modified with the argument “:width: 100%” in order to tell Sphinx to stretch the image to the width of whatever it’s generating documentation for. Then I have sections, made visible by their headings and a line of “*” of equal length as the section title. Each section is composed only of a single automodule directive that instructs Sphinx to generate autodocumentation for the specified module.

Effectively, this is a recursive autodocumentation structure. Sphinx enters the package and sees the directives there, which in turn instruct Sphinx to generate documentation from each module. Each module contains further instructions for Sphinx.

If you’re interested in ReST syntax or the Sphinx directives – there are lots of them – just take a look at the Sphinx page for their own documentation. It’s generated – you guessed it – using Sphinx.

Setting up Sphinx

Now that your sources are documented and ready for Sphinx, it’s time to generate the documentation itself. In my case, I’m generating HTML output, but Sphinx can use the same sources to generate all kinds of documents: HTML, LaTeX, ePub, man pages, etc.

All you have to do is install sphinx and create a folder in your project to hold your documentation sources. Sphinx has a great tutorial on getting started, so I won’t bother repeating what they write. Just remember that if you want to use autodocumentation – and you almost certainly do – you’ll need to activate the autodoc extension, either by modifying the conf.py file that configures your Sphinx project or by using the sphinx-quickstart utility and toggling the autodoc extension to on.

In my case, I set up the documentation in its own folder, called doc, nestled right next to my sources. The entire project structure looks like this:

Stereovision
|-bin
|  |-Executables
|
|-stereovision
|  |-Sources
|
|-doc
|  |-Sphinx sources
|
|-CHANGES.txt
|-LICENSE
|-MANIFEST.in
|-README.rst
|-setup.py

You don’t need to worry about the structure, I just showed it to demonstrate how you might set it up in a typical Python package.

To get started, I navigated into the folder “doc” and ran sphinx-quickstart. Then I made some modifications to conf.py:

# Add directories to document to sys.path so that autodoc finds the sources
sys.path.insert(0, os.path.abspath('..'))
...
# Just some settings that I like when documenting my sources
autodoc_default_flags = ["members", "inherited-members", "show-inheritance"]
autodoc_member_order = "bysource"
autoclass_content = "both"
...
# The theme I chose
html_theme = 'nature'

You’ll also remember the directives to include images showing import structure, class diagrams, etc. I autogenerate those images so that they’re also always current. I did this by modifying the makefile Sphinx generates to make the documentation with.

MODULES=blockmatchers calibration exceptions point_cloud stereo_cameras ui_utils
...
# Add autographics as a dependency to all targets so that you always build it
# before generating the docs
autographics: $(MODULES)
    cd .. ; \
    pyreverse stereovision -o svg -p StereoVision
    rm ../classes_StereoVision.svg
    mv ../packages_StereoVision.svg .

$(MODULES):
    pyreverse ../stereovision/$@.py -o svg -p $@ ; \

This just tells make what modules I’m interested in and creates new targets for each of them. It then uses pyreverse to create diagrams for each module and for the entire package. Finally, it moves the documents into my doc folder so that Sphinx can use them when building documents.

Finally, you’ll need to generate some documentation outside of the sources so that Sphinx can build its tree. Sphinx will use index.rst as the root of its tree unless you configure it to do otherwise. Let’s look at my index.rst:

.. include:: ../README.rst

.. toctree::
   :maxdepth: 2

   usage
   development
   changes

Sphinx uses that to generate this.

All it is is a directive to include the contents of my README file, which is already in ReST, and instruct Sphinx to build a table of contents referring to some other documents: usage, development and changes. These are also ReST files files in the doc directory. They need to be suffixed by *.rst or whatever you’ve configured Sphinx to recognize in conf.py.

In my case, usage.rst is a short text describing the purpose of each script in bin. Changes is only an include directive to include the project’s CHANGES.txt file. development only contains this:

StereoVision source documentation
=================================

.. automodule:: stereovision

This pulls up all the documentation in the sources.

This way, as long as I keep my documentation in the sources up to date, update my README and CHANGES files as needed and describe any scripts that I release, my documentation is always up to date. The files meant purely for Sphinx contain no content, only directives – this means that meaningful content is only in places where people who don’t necessarily read the documentation might also look and where it has to be anyway for the project. The README file is always relevant, as is CHANGES, and the sources of course need to contain that information too. Structuring your project this way keeps all meaningful information where it should be expected and instructions separate. Data is data; code is code.

Deploying the documentation

You can deploy the documentation Sphinx generates using make. In my case, I want HTML documentation, so I generate it from the doc folder with “make html”.

It’s also possible to integrate the document generation into your versioning system using hooks. For example, you could create the following file, making it executable:

# Content of .git/hooks/post-merge
cd doc
make html

This would autogenerate the documentation every time you do a merge, storing it locally.

I deploy my documentation to GitHub Pages. Of course, I could always keep that current to my development branch, but I prefer to have my last release documented online – developers can build the documentation on their own if they want. So I have the following setup:

I made a branch in my repository called gh-pages, which GitHub pages recognizes as the content to deploy for me. I cloned this branch into a separate directory next to my main repository and added the following files to it:

  • .nojekyll – Just to make sure that GitHub Pages doesn’t try to deploy my content with Jekyll. Jekyll doesn’t like files or directories that start with a leading underscore, and Sphinx generates lots of those, so it’s just easier to turn Jekyll off.
  • deploy.sh – I run this script to deploy my documentation to GitHub Pages.

This is the content of deploy.sh:

#!/bin/bash
cd ../StereoVision/doc/
make html
cd -
cp -r ../StereoVision/doc/_build/html/* .
git add .
git commit -am "Update documentation"
git push

It just navigates into my documentation folder, generates the documentation, comes back and copies everything into the documentation branch. Then it adds everything, commits it and pushes it to GitHub. GitHub does the rest.

And there you have it! In those few steps you can set up a simple and elegant autodocumentation system that will help you keep up to date and with good documentation deployed all the time.

Advertisements
About

My name’s Daniel Lee. I’m an enthusiast for open source and sharing. I grew up in the United States and did my doctorate in Germany. I've founded a company for planning solar power. I've worked on analog space suit interfaces, drones and a bunch of other things in my free time. I'm also involved in standards work for meteorological data. Now I work for the German Weather Service on improving forecasts for weather and renewable power production.

Tagged with: , ,
Posted in Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

From the archive