What is the best way to document code?
Posted by 07/02/2007
If you're a programmer, you write code for a living. Part of the job is documenting the code you've written. Why? Well, either to help yourself understand your own code - or to help whoever replaces you to understand it, were you to leave. Most people (although not all) agree we need some documentation of some sort. It's just a matter of what kind and how much.
The problem is that the more time you spend documenting, the less code you can write. And the more code you document the more text you have to update if the code is updated. It makes your code an immovable mass. It's a conundrum. Especially when your trying to be all Agile and customer-driven and all. There is no direct benefit to code documentation for the customer. Why not just drop it?
Well, if you ever had to take over a project - or even had to revisit a project of your own, you know the answer to that. It's like trying to re-decipher hieroglyphics.
Your Typical Shop
I don't know what the typical shop is really. What I've seen are extremes in both directions; I've seen projects weighted down with a mass of 300-400 pages of documentation for every 2,000 lines of code. And I've seen projects with 30,000 lines of code with virtually no documentation.
What I do know is I've never seen it done satisfactorily. Sometimes people will hand me a big notebook with 300 printed pages and say "Here's the documentation" and it's next to useless. You'd think 300 pages would give you more information.
I've also had people tell me "I'm very careful about naming - so the code is self-documenting". And they really believe this is true, but, sadly one man's self-documenting is another man's gibberish.
They are correct in noting there is a difference between something well-named and something strangely named. But the diligence required to name everything perfectly is more difficult than it seems. It's just like Encapsulation in general. It's too easy to lull yourself into something like this:
public void setReference(Reference reference)
And then think to yourself, well, I don't need to say "this method will set the reference" because
that's redundant. So I won't say anything at all. Whereas it may not be obvious at
all what a
Reference is really all about - and why you would want to set it
in this object.
Encapsulation is actually pretty tricky to pull off well. It's like proofreading your own work.
Your brain fills in the missing parts because it knows what the missing parts are.
The Code is the Documentation
I've seen a lot of code that is just sitting there with no explanation, not even within the code itself. I would say that is common. Page after page of code with no comments. I'm not sure what the programmer was thinking, although my guess is either.
- "The code is so obvious that no comments are necessary" or
- "I keep changing the code and changing the comments and it's doubling my work - so I'll leave out comments"
If you choose a) then I promise you're wrong. If you choose b) then you have a point. I admit. I'm not sure how to answer you besides "Yeah, it sucks".
I've looked through my share of code that has little or no explanation. And I have to say it is a chore to read. It has not sunk into the psyche of computer science as a field that code will be read more often than it is written - so it had better read well. That's what's so nice about Python. It reads well, if nothing else.
JavaDoc and the Like
One approach to documentation is Literate Programming, or, if that's not available, at least some kind of methodology for extracting documentation from code. The most famous example of this being JavaDoc.
The idea is before the code begins, in-between comment marks, you make some statements about the function, or the class or whatever so:
/** * @param dir the directory to use * **/ public void setDirectory(String dir)
Then you run JavaDoc on all the code, and you get a big pile of html files that describes every method, every class, every interface - everything. Java has JavaDoc, Ruby has RDoc, C++ has Doxygen, and Python has a variety of things. Most languages have their version of JavaDoc - even Haskell has Haddoc.
API Docs are Reference Material
The generated documents are generally referred to as 'the API docs'. And they do help navigate code. I know I've referred to the API docs for Java tens of thousands of times. It will tell you all the functions you have available, and all the functions you've inherited, among other things. It's great but ...
it's pretty thin for giving an overview of how it all works - how all the object interact, what the overriding concepts are etc... Some people think the API docs suffice for documentation. I was once one of those people.
But I came to realize that it is just reference material. It's good for looking things up, clarifying things - but it's not going to give you the philosophy behind the code. It's not going to tell you how you struggled to make it work with another API - or the vision that never quite materialized.
Some people at this point might suggest a Wiki. A Wiki is set of html pages that are editable in place and that make it easy to link to other Wiki pages. The idea is that everyone that works with your code can add pages as they go along. When you learn something, add it to the Wiki. If a tutorial is necessary, add it to the Wiki. Pretty soon you have this self-perpetuating mass of helpful documentation. Right?
A problem with the Wiki is that is gets out of date just like everything else in the world, and it's serpentine nature makes it difficult to keep clean. The same thing happens when you allow comments in your documentation.
Somewhere I read someone describe a Wiki like this:
"Wiki is hawaiian for 'can't find sh*t'".
I can't remember who said that, but I've found it to be true. The Wiki seems like a great idea. And it sort of is. It's a good start. And something like Trac shows the potential of this. But if you ever look at the Ruby on Rails Wiki you'll see the problems very quickly. a) Out of date information b) Misleading information c) Stuff that would be cool if someone finished it d) "How would I find something about ... substitute anything ... here?" e) "These giant Germanic concatenated words are starting to get ridiculous".
Trac is an interesting project - an issue tracker, source-code browser and documentation engine all in one. I'd say it's close to going in the right direction. But the fact that not everyone in the universe is using it leads me to believe it is not the answer.
One approach is to look for some examples of good documentation out there in the open-source world, and try and emulate them. The Django project is fairly well documented, despite what some people say, and it uses Trac. Also, anything by Michael Bayer such as Mako or SQLAlchemy. Those both use Trac as well. So, like I said, Trac is a good start. But the thing that makes these good is the fact that someone took the time to write overviews and examples - not the use of Trac itself. In fact the Wiki is pretty useless for all of these projects. And you don't need Trac to write an overview. You could use any word processor in the world. You can use vi.
Learn to Write
My conclusion? To write good documentation you have to learn how to write. There's no automated tools that will do it for you. Write like you would write an article or a book or a paper. You have to actually put your thoughts together and let it all congeal into sentences and paragraphs. There's no secret to making it easy. You can write a 500 page book or a 2 page article. It depends on how difficult the concepts are.
I would say try to keep it under 10 pages though. If you hand someone a tome of documentation it tends to be useless, because it's so boring. One of the skills of writing is trying to keep the reader interested. Save details for reference material. And use pictures. Lots of pictures.
That's about all I've figured out. And if your still reading this, it proves my point. So write a blog or something. It'll suck, like this one, but it will help you learn to write better. I promise. And when you win the lottery, your replacement will thank you.