Recently, a product manager reached out to me on LinkedIn with some questions about how to approach documentation at software companies. And frankly, the questions are terrific. They highlight so many of the struggles that we all face in our goal of creating great documentation. I’ve posted the full list of questions below, along with my answers.

What do you find to be the most convincing arguments for the value of investing in a proper technical product documentation team and process?

The ultimate value of all documentation is time saved, which is great, because developer time is expensive, but bad, because it’s nearly impossible to quantify. Certainly you have to come armed with some numbers (e.g. “Our documentation got 37,000 hits last month and only six issues, only one of which was high priority, and all of which we resolved within one week.”), but I’m starting to think that another critical approach is an appeal to ego/professionalism. Something like, “Great products have great documentation, and ours looks like it was put together by four overworked salespeople, because it was. We’re supposed to be an $8 billion company that just booked multiple contracts in the tens of millions. Look at our getting started guide.”

Metrics and surveys support the notion that developers consume documentation in huge quantities and rely on it to get their jobs done. Like any product, if you don’t have accountability (people you can fire if the documentation stinks), quality is going to suffer.

What tools do technical writers use to generate documentation? Do they most commonly use Markdown/reStructuredText/AsciiDoc, or do they use something totally different? FrameMaker? Word? What would be your ideal solution when it comes to going from a language to HTML?

Bizarrely, lightweight markup is still considered a bit bleeding edge, but lots of companies use it. AWS uses Sphinx for its CLI reference, all of the Azure documentation is in Markdown, Elastic and MuleSoft use AsciiDoc, etc.

Documentation is a struggle everywhere. Other popular tools (none of which I recommend):

  • MadCap Flare (Windows-only, XHTML, $1,800 per head)
  • Wikis (MediaWiki, Confluence, etc.)
  • Zendesk or other support portals
  • Word (if you only build PDFs, not websites)
  • DITA or DocBook (XML-based markup languages) using editors like Oxygen (Windows, macOS, Linux, $1,155 per head) and FrameMaker (Windows-only, $999 per head)

Aside from DITA and DocBook, most non-technical technical writers find these tools easier to learn than lightweight markup and static site generators. But if the goal is to get technical people to contribute, lightweight markup is the way to go.

XML-based content like DITA and DocBook is easier to translate, but the downsides are just so massive: you need a specialized editor to contribute in an efficient way, content is four times the length of Markdown, side-by-side diffs are awful, and developers will never touch it.

I was tasked with documenting a brand new open source project this January and went with Markdown and Jekyll on GitHub Pages. If I hadn’t been able to open source the documentation, I might have gone with Hugo or Gatsby instead of Jekyll. If it was a more complicated project, I might have chosen Sphinx, but they’re all very similar and all do a fine job.

If I had access to a small development team and had to manage 12 versions each of 100 different projects, I would invest in an extended flavor of Markdown that addresses its shortcomings in documentation and build custom tooling to handle multiple repositories, multiple versions per repository, and search.

How is documentation structured at companies? Do developers write the documentation, or do technical writers do all the writing? How do technical writers interact with the PMs/developers? What collaboration modes have you seen to work well (and not work well)? What is each party responsible for?

At most companies, technical writers do all of the writing, but that’s an anti-pattern. The better way is for technical writers to do some or most of the writing, for developers and PMs to submit pull requests as they identify gaps, and for writers to test those submissions, ensure accuracy, edit content, and maintain a high quality bar.

In general, submitting a pull request isn’t that much harder than writing an email explaining how something works, and yet developers are so happy to do the latter and so hesitant to do the former. It’s a cultural shift that most companies haven’t made.

In an ideal world, technical writers sit with their project teams, do tons of their own testing, work with PMs, developers, support, and testers to shore up knowledge gaps, write their own, original content, and hold the quality bar for the docs. PMs add release notes, developers tweak reference content for new settings/APIs, testers help verify accuracy, and support makes sure the content addresses real-world customers issues.

What technologies do you use for docs? One struggle is the reStructuredText vs. Markdown question. It’s been difficult to find a standard and a tech that balances a workflow that enables developers to write documentation, technical writers to copy edit, and with enough functionality to provide to provide a polished product to customers.

Markdown has two enormous advantages (popularity and learning curve) compared to RST’s much more powerful features for managing large sets of documentation (cross-references, callouts, conditional processing, list tables, variables, and transclusion).

In particular, conditional processing (excluding or including chunks of content/entire pages depending on the situation) and the intersphinx plugin have been important to some of my prior projects. Sphinx also offers out-of-the-box JavaScript-based search, which is appealing for distributing to customers. A JavaScript library called Lunr offers pretty good search without the need for an internet connection and isn’t too hard to integrate into Markdown-based solutions like Jekyll.

If you don’t need that much flexibility, a Markdown-based static site generator might be appealing. No matter what, you want to have access to a designer to ensure good typography and usability, as well as a web developer to make sure that the search experience is great.

Converting from one markup language to another always sucks. I’ve done it several times now. Typically, you use Pandoc to do the bulk of the conversion. Then you use regular expressions for the cleanup. Finally, you do manual cleanup of edge cases and recreate the tool-specific metadata and lefthand table of contents by hand. Think weeks or months, not days.

How do you distribute documentation to customers?

When I have my way, the only distribution channel I use is a hosted, external website that I can update whenever I want. But if you absolutely must offer documentation to people who don’t have internet access, well, that’s another area where Sphinx does a pretty good job. The sites that Sphinx spits out will open on any computer and look the same as they do on your website. They’re super easy to host on an internal server. With Jekyll and Hugo, you have to change a setting or two and rebuild, which isn’t the end of the world, but it’s annoying.

The tools for turning lightweight markup into PDFs are not great. Typically, you have to convert to HTML, DocBook, or LaTeX and then render that intermediate format out to PDF. I advise avoiding PDFs if at all possible. Some people claim to read them. Some people even claim to like them. I don’t believe either group.

What process do you use to identify when sections need to be updated? How do you deal with versioning if different customers are using different versions?

For identifying which sections need to be updated, that’s just a process issue. If the technical writer sits with the team, attends meetings, and generally works with them every day, situational awareness doesn’t tend to be that big of a deal.

Versioning, on the other hand, is a huge pain and an unsolved problem, especially if you backport fixes to old versions of the docs. With static site generators, you literally have to checkout the branch, build, and then drop it onto S3 or something. Handling builds/hosting multiple versions is really the only value-add of a project like ReadTheDocs.

The simple, “my kind of stupid” solution is to just treat old versions of the documentation as immutable, build them locally, and drop them onto a server somewhere, but again, that assumes internet connectivity on the part of your readers.

What systematic or technical processes do you implement that you’ve found critical for streamlining for long-term documentation success?

If I’m understanding the question here, I only have a couple real tips: if possible, open source the documentation (on GitHub or wherever) so that readers can easily create issues and contribute fixes.

If you can’t open source it, at least have an easy-to-use feedback link so that you can get bugs/complaints from readers. Metrics are also important, of course, in determining what people are reading (which is presumably what they care about). If a page only generates 10% of the hits, for example, but 50% of the bug reports, you probably need to reexamine that page/feature to figure out what the disconnect is.

Anything else you think I should know?

You’re asking all the right questions, so that’s the good news. The bad news is that as you keep digging, I think you’ll probably be disappointed at the state of the art.

Markdown is a great tool for writing a book, but it falls down a bit when you’re trying to manage thousands of pages across 150 projects that have complex formatting and tens of thousands of links. Refactoring becomes pretty awful. Then again, the easy learning curve might be a worthwhile tradeoff if you’re struggling to get developers to contribute.

No one really has versioning worked out, and generating a professional-looking PDF from a collection of 300 Markdown files is… tough.

One thing I think you’ve overlooked is translations. If you need to have documentation in French and Korean as well as English, you’ll want to interface with a reputable service before choosing any markup language/static site generator to figure out what file formats they accept and how that entire process will work.

If you use Sphinx, no one will accept RST files (and if they do, they’ll almost assuredly mess them up), but you can spit out what are called PO files (string by string splits of the text) for translation.

I believe many translation companies now accept bare Markdown files, and of course HTML and Word are popular (but you shouldn’t use them). Whatever you go with, someone needs to send those files off to the service, get them back, commit them, tag/branch, build, publish, and repeat.