Retrograde tooling

We've reached a tipping point in technical writing, and we need to deal with it.

The future is not going to involve producing documents, it's going to be producing seamlessly integrated content in and around products, and that means we need to change our expectations around tooling. This is going to hurt, but hopefully in the good way.

When I started writing, I learned FrameMaker. It was amazing. It had 20 years of history even at that point, and you could do anything with it. Equations, printer's crop marks, swapping text blocks in and out automatically, producing print, PDF, and help files from the same file, complete with changing cross-references. Using Frame was as much art and science, and there were key commands that were transmitted through the mailing list, by word of mouth, because they were no longer in the official documentation. It had a learning curve like a brick wall, but once you got through it, you were driving an unstoppable tank that could roll over any problem.

After 15 years, including occasional painful exposures to trying to do professional documentation in Microsoft Word, we got a new, hot product called MadCap Flare. It was like FrameMaker had been reincarnated for the web. It was easier to use, had integrations for localization, CSS, HTML5. It was shiny and beautiful and powerful. I could still do conditional text, referenced text blocks, and single-sourcing.

I haven't used a technical writing tool like that in 18 months, and I think it will happen less and less often.

The rise of Software as a Service and app culture is driving us away from a world of User Guides and Admin Guides and Installation Guides. Instead of CDs with instructions, we expect pop-up help bubbles and interfaces that guide us through what we should be doing. This makes a lot of sense, since we are all experiencing individualized experiences based on our bandwidth, app collisions, needs, and licensing structures. Writing a document that includes all that would be extremely difficult with the old tools, and we don't have the old tools anymore.

We don't have them because without the need to produce Official Documentation, we've reduced and eliminated technical writing departments and jobs. We're moving a lot of that work into User Experience, and we are asking the writers that remain to serve as a clearinghouse for content produced by developers. Instead of a journalism model, where we interviewed developers or were embedded on the team, we are now working as aggregators, taking content from many sources and getting it ready to distribute. Or there just isn't a writer, and publishing is much more ad hoc.

FrameMaker and Madcap Flare cost between $1000 and $2000 per seat license. That's not something you're going to buy for someone who is doing it part time. And it's hard to port data in and out of them, because of all the metatext and compiled information that's attached to them in the tools. It's a database, not a set of flat files. Or more accurately, it's a compiled binary – not the program, the document. So a writer using one of these tools would be a hard gating function on a team's ability to publish their own writing. It would look great, but you'd have a single point of failure.

Instead, we've been moving toward having technical writers use tools that are freely accessible to developers. Markdown that is compiled with readthedocs or or some other blunt-force tool to create a page with a table of contents and some minimal linking. The advantage of this is that everyone can read, write, and access the docs. The downside, which is mostly only visible to technical writers, is that we have lost enormous richness in treating our documentation as an object-oriented compiled object.

Here are some examples of things that I either can't do or are very difficult:

  • Conditional text – different output text based on flags set on the whole document.
  • White-labeling – configuring my documents so secondary organizations could brand them themselves
  • Smart localization – Doing bulk localization based on common words and phrases, reducing translation time and cost
  • Write once/publish many – having a master document that can produce help, web pages, print, and in-product assistance
  • Context-sensitive help tied to the page the user is on
  • Tables – Markdown, why are you so bad at this?
  • Sophisticated layouts – headers, footers, breadcrumbs, inset images and text blocks
  • Automatic ingestion and formatting of other file formats
  • Includes, referenced text, inserted files – think libraries

That's a lot to lose! Sometimes I feel like writing in Markdown is like making Gutenberg to give up on the printing press and presenting him with and angry goose and a curly sheepskin.

However, it's not going to do me any good to bemoan the lost glories of the tech writing tools Roman Age, because we are in the Dark Ages now, and we just have to move forward to the Enlightenment as soon as we can.

….I was going to make a comparison table for you here in WordPress, but that's not a thing. My point exactly.

Anyway, here's what I think we could start using to regain some of our lost abilities:
CSS – Might be able to help with layouts, HTML5 support, tables, and white-labeling
Feature flags – conditional text, write once/publish many, and ingestion
HTML – (stop laughing. HTML is much more fully featured than Markdown) tables, context-sensitive help, includes
De-duplication – smart localization, includes
Javascript, PHP, etc – ingestion and formatting, menuing, navigation, breadcrumbs

Wow, that's a lot to learn. On the other hand, I heard a talk last week that included a 5 minute list of everything you "needed" to know to become a useful devops engineer, so I guess it's not bad compared to that. It is a pretty intimidating list for people who got into technical writing because they like writing, rather than because they like technology. It's not that writers can't learn tech – obviously we can, because we learned those old tools. It's that there is only the beginning of a community around this new renaissance writing, and no one tool that we can teach juniors.

I fell into tech writing because I took an English literature class with a very avant-garde professor who taught us hand-coded HTML (because 1995). That was the key that opened the door for me. But what key is out there now?

I hope Corilla continues to learn and grow and thrive – they could add in many of these features or already have them.

I hope Read the Docs gets enough funding to hire someone who is a technical writer as well as a coder, because they are missing some significant use cases, but they are also an industry standard.

I hope the OpenAPI standard and Swagger make it more possible and easier to expand their powerful use of code snippets into non-API documentation, because they are coming at the problem a new way and it involves a lot of the hypertextuality I'm looking for.

I hope Confluence realizes that they are setting people up to create unusable wikis because they default to an older, siloed world.

And honestly, I hope SharePoint spontaneously combusts, because that is TERRIBLE.

Why on earth would developers WANT to write documentation?

This week I was at DevOpsDays Minneapolis, which was an amazing, warm, and inclusive event. There were at least three open spaces on documentation, communication, and related problems, and I didn’t propose any of them!

Minneapolis DevOpsDays 2017 logo

The question the devops world keeps asking, over and over again is,

How do I get developers to write anything?

There aren’t a lot of easy answers for this.

The first one is that if you care about developer-produced documentation, you should hire for it. But almost always, the people hiring are not the same as the people feeling the pain of missing documentation, so that may not happen.

The second is that you should hire someone other than the developers to write, if that’s important to you. I’m going to mention that all the times I’ve interviewed at Google, it was to be as an internal systems writer, documenting how things in the company work for other people in the company. Big, smart, companies hire writers for internal documents. But that is a tough budgetary sell.

The third, partial, answer, is that you make it significantly easier for developers to write. You give them forms and templates to fill in instead of expecting them to invent documentation theory. You specifically allocate writing time that can’t be budged for “one more tweak to the feature”.

Here’s the key problem though:

No developer has ever been fired for lack of documentation.

No developer I know of has gotten a bonus for documentation.

There is no stick. There is no carrot. There are a lot of competing demands on a developer’s time. Why on earth would they go do something that is not in their core competency just because you want it?

My instinctive response, whenever my kids want something, is to say, “It’s good to want things.” I think that is the instinctive response of most developers who are asked to do extra work. It’s not malicious, it’s just true. There is no reason they should do it – they won’t lose their jobs, the way they might if they prioritized writing over code. They won’t derive any benefit except the long-term and intangible one of being asked fewer questions. If they like being the source of knowledge, they’d be giving that away. If they like coding, you’re asking them to do something other than what they want. There is no upside.

So knowing that, how can we encourage developers to write documentation that we need?

First, reward it. Give people prizes for writing docs, or for throwing away or revising old docs. This needs to be something outside their usual compensation structure. Gift cards, time off, merit badges, whatever would be meaningful to that person. Ask them. They know what rewards them, because they use it to reward themselves.

A variety of rewards

Second, require it. I don’t mean “You have to write some docs or I will, uh, be disappointed in you.” I mean “The presence of documentation is a quality gate and you cannot release without it.” I mean “QA is using your docs as the test story.”.

Companies are never altruistic, humans are never purely motivated. We need to accept this and stop expecting people to do extra work because it’s the right thing to do, and start making the work not-extra, and the reward tangible.

BIG CONCEPTS small docs. User documentation is a failure of user interface. Faster and cheaper than making your developer write it. Elegant writing saves money. Paid by the anti-word.

Why are there no documentation bootcamps or schools?

Sarah Mei and Sarah Fox and some other people and I were talking on Twitter about code school graduates, and how they frequently do really well at mid-level positions because they have communication skills from their previous careers.

We all agree that teaching someone to write well enough to be a Professional Technical Writer is more than you could cover in a six-week intensive, probably. Depending on where they start. I could absolutely turn a motivated journalist into an entry-level tech writer in 6 weeks. They’re 3/4 of the way there already. But most people would need to learn the following from scratch (in no particular order):

  • Audience analysis
  • Task-based writing
  • Procedural writing
  • User story writing
  • Error writing
  • Bug writing
  • Conceptual writing
  • Diagrams, illustrations, and screen captures
  • Rudimentary tools use
  • Sentence construction
  • Bullet point use
  • Theory of simplified English
  • Accepting editorial feedback
  • Accepting technical feedback
  • Elements of information architecture and presentation
  • Elements of usability and accessibility
  • Style guide use

Even if you’re coming out of a liberal arts/literary background, half of that list is something you may never have heard of. You can construct a great sentence, and you know how to use a semicolon, but your experience to date has rewarded you for things that don’t matter at all in technical writing. You’ve gotten good grades for making persuasive arguments or writing pithy literature reviews, but no one in my English Lit classes ever told me that I was addressing the wrong audience or designing my headings badly.

So that’s a daunting list of things to try to cram to get people into what is, let’s face it, a less glamorous profession than Code Warrior. I happen to think it’s the best and most influential place to be in a development team, but that’s not what the pay scales say.

That said

That said, I think I could do a LOT with teaching code school students and other developers some really basic writing skills. I tried this out at a Workshop at Write the Docs this year, and it went pretty well. I’d love to present it elsewhere. I’m putting the outline below the cut, if you’re interested.

Give me your developers for a day, and I’ll teach them to write a well-constructed error code and a readable bug report and a coherent status email. Give me two days and I’ll teach them about audience and analytics and deletion.

I want to do this for code schools, so we can catch people when they are frantically learning, so they will remember to look it up later. I want to do this for open source projects, because it is so hard to find or hire a technical writer. I want to do this for conferences where junior developers are looking to elevate their game. I have been traveling around to developer conferences for several years now, trying to distill everything I know from 20 years of my career into something that has value for coders, and I am convinced it has value.

Continue reading

Creative destruction

I was making a joke about the fact that I accidentally have two profiles for Open Source Bridge – one is with pink hair, and one with brown hair. Obviously, one of them is the mirror universe me. Probably the one with brown hair. No one with pink hair could be sinister, right?

I said that my mirror universe self would obviously specialize in creative destruction, instead of speed-building documentation sets from nothing. But in truth, I do both.

Any sufficiently complicated documentation set is indistinguishable from a pile of cruft.

Creative destruction is related to deleting, but is not the same. Creative destruction is less trashing things by category, and more like letting a herd of goats loose to clear a hillside covered in Himalayan blackberry. The goats are a lot more discriminating and not as destructive as a flamethrower, but they are omnivorous enough that your wild roses may also get eaten with your blackberry brambles. Creative destruction lets you take a cleared space and plant something new in it, without having to work around what has grown up over time.

This is the part of the konmari method that I identified with most. Put everything of a category together, look at it, and decide what it is that you want to keep. It's really difficult to make distributed decisions because it requires holding all the objects of that type in working memory, and our working memory, as literate humans, is just not that good. Play to your strengths, which in this case means "make it easier to be bold about deletion/destruction".

When I walk into an organization with existing documentation, I like to do a gap analysis: What traditional docs sections might they be missing? What things do I wish I knew as someone new to the product? Does the documentation cover every step of the product, from pre-installation to removal? Can people make good business decisions based on these documents?

A documentation destructionist looks to see what we can carefully, mindfully burn to the ground. Is it old? Is it about something we don't support? Is it a path we didn't follow? Archive it, delete it, and move on. You don't want to leave old versions of anything around if it will confuse the issue when someone is trying to use your tool.

Clearing a swath of ground makes it possible to see the daylight between the trees, give your good docs a bit of breathing space, plant new things that you want for the future or the present. New ideas grow best with some sunshine and elbow room, and even if you have infinite storage space for documentation, no one has infinite attention to read or maintain it.

Happy Tuesday! What can you destroy today? What would your documentation set be better without? What is out there bothering you ever time you go past it? Go get rid of it!

The art of deleting

I have a talk where I encourage everyone to be clear on the data they collect and keep. I encourage people to automate deletion so they don’t have to do anything extra. In the original incarnation of the talk, I said that everyone should apply konmari principles to the data they keep, only instead of “sparking joy”, data we keep has to have a clear and immediate purpose. It has to spark respect and utility.

I like konmari for the idea of grouping all of a similar type of thing together and then sorting them together. I, er, don’t usually attribute emotions to my socks. I think it’s a little difficult to sort data based on the emotions it gives you. Data at the scale most organizations are working at is less in the range of “books you own” and more in the range of “bacteria in your body”.

When I was looking for another analogy, I thought back to my time on the Microsoft BitLocker team. I was with them as a writer for that first year, when we were still explaining the value proposition over and over again. The laptop, we would explain, was an easy loss to write off, as long as the data on it was secured. A couple thousand dollars worth of laptop left in the back of a taxi was so trivial compared to the cost of a data exposure or breach. It was difficult to change our paradigm at the time to “trust the cloud”, but that’s where we were headed. The data was on a corporate network, or a backup, or in the rudimentary beginnings of the cloud. It was ok if we never got that laptop back. We all had to change how we thought about losing things versus losing access versus losing data.

Here are some instructions for deleting your personal data, as inspired by one of my favorite poems.

One Art

The art of losing isn’t hard to master;
so many things seem filled with the intent
to be lost that their loss is no disaster.

Delete reminder emails, meeting notices, any ephemeral message about an event that has passed.
Delete pictures if they’re not labeled, anything of people you don’t know or care about.
Delete the easy levels, the games you don’t play, the spreadsheets for projects that are long past.

Lose something every day. Accept the fluster
of lost door keys, the hour badly spent.
The art of losing isn’t hard to master.

The hour badly spent is hard to delete, you’ve already done it. Nevertheless,
delete games you don’t enjoy playing.
Clear your media feeds and timelines of people who don’t feed your soul.
Dig down and delete those emailed fights with your ex, or your current. That was then; this is now.

Then practice losing farther, losing faster:
places, and names, and where it was you meant
to travel. None of these will bring disaster.

The opposition to this line is “Hold fast to dreams/for if dreams die/life is a broken-winged bird/that cannot fly
Delete who it was you meant to be, all the things that make you feel guilty.
Purge the Pinterest for a wedding you did another way
Dump fitness apps that you just wince at, delete false starts and walk away.

I lost my mother’s watch. And look! my last, or
next-to-last, of three loved houses went.
The art of losing isn’t hard to master.

What are you leaving for your heirs?
When we come to clean out your house, will there be boxes of clippings?
Clean and organize your bookmarks, toss all the pointers to dead sites
Ruthlessly rid yourself of mediocre selfies and unlabeled group photos and clutter.

I lost two cities, lovely ones. And, vaster,
some realms I owned, two rivers, a continent.
I miss them, but it wasn’t a disaster.

You will lose some things you meant to keep. That is the nature of things.
You will regret some deletions, and you will worry a bit.
I’m sorry, but life is full of loss, and paper fails and disks fail and the memory of humankind is frail.
It won’t be a disaster.

—Even losing you (the joking voice, a gesture
I love) I shan’t have lied. It’s evident
the art of losing’s not too hard to master
though it may look like (Write it!) like disaster.

Practice losing, practice letting go, practice only saving things for a year, or two, or ten.

The art of losing, of forgetting, is built into us, the entropy that causes things to fall apart. Fighting our nature is sometimes noble, but less than we hope. it’s sometimes useless, but less than we think.

Nothing I write will be relevant in 5 years, 10 at best. That’s always been true. Technical writing is like that, and I have to accept that I’m etching server instructions in the sand at low tide, or lose my heart when the waves come in. Almost nothing you write, or save, or store, or archive will relevant for longer than that, either. Learn to let it go, and prepare your writing stick to scratch meaning in the beach at the next low tide.

Loss is not a disaster.

Lady Conference Speaker: Slides

Once you get a conference talk selected, you’ll want to put together the talk. There are speakers who can give entire talks without any visual aids, but most of the rest of us count on having some pictures to look at and prompt us. There are entire books written on slides – I liked Presentation Zen and Slide:ology. I can’t replicate everything all the books have to say, but I can tell you what I’ve found useful.

As you make more presentations, you’ll find your own style and voice. Be sure to watch what other speakers are doing and borrow the best of what you see. Some people do very rapid slides, and will have as many as 90 slides in a 25 minute presentation. Some people like to linger on a few key images. Some people use words, and some people avoid them entirely. You’re going to find your voice, but in the meantime, here are some things that would have helped me to know when I started out.


Slide software falls into two categories: WYSIWYG (What You See Is What You Get) or visual design, and compiled, or coded design. The first category is things like PowerPoint, Keynote, and Google Slides. The second category is more like building an HTML page, where you specify elements in Markdown or another language. These are RevealJS, RemarkJS, Deckset, GitPitch, and others.

I only use the first kind of slide software, but people who use coded design tools appreciate the reliability and customizing options of writing their own slides from scratch. My preferred tool is Google Slides because I can work across all the platforms I write on, I can easily send the link to an organizer, and I don’t have to worry about compatibility. In the last year, Google Slides fixed two major problems I had – you can now present offline, and you can use a Bluetooth presentation remote to advance your slides.

You’re going to want to get a presentation remote. They can be had for about $20, but you can get a really nice one for about $50. It depends on how much you want to invest in your speaking future. Practice using it  to understand the range and direction that you can get away with. Having a remote means that you can get out from behind the podium and it gives you something to do with at least one of your hands.

I travel with and present from an iPad, which is more convenient for me than a full-sized laptop. I got an HDMI adapter, and that is a pretty standard expectation at this point. If a conference is using something other than HDMI, they’ll probably let you know. My entire presentation kit fits in the tiny hand-size case for my presentation remote (clicker).

Standard elements

I have some elements that appear on all of my slides:

  • Twitter handle
  • Talk name
  • Conference
  • Hashtag

That may be more information than most people need, but I have a reason for all of them.

I think the Twitter handle is most important. Very few people will remember the introduction slide well enough for them to accurately quote you 20 slides later when you say something they want to talk about. Putting it on all or almost all your slides means that if a picture gets taken, people can come find you if they want more information.

The talk name also helps people identify a picture of the slide, and sometimes it helps me remember which talk I’m in, since I sometimes re-use images. It’s especially helpful for times that I have changed the name of the talk.

Including the conference name is something I do because I re-work my slides for each conference. I don’t entirely rewrite them, but I do add and subtract slides to change the audience, the time, the technology emphasis. Adding the conference name also shows the conference and the attendees that I am thinking about them specifically. I also save each presentation separately, for my records on how talks historically evolve.

The hashtag is less essential, but I like to have it available so people can use it and I can unify my talks around it.

I like to put all these elements in one place so that I can change them easily – in this example, it runs along the bottom of all the slides in this presentation.

I also have fairly extensive speaker notes for all of my presentations – not because I read them, but because it makes the slides more useful whet they are distributed without video. That’s a personal preference, but remember that everything in the speaker notes will be visible if you distribute the deck, so you may want to be careful about what you say.


Don’t steal other people’s intellectual property.

Can I say it more clearly than that? Don’t use pictures without permission.

Conference talks are absolutely your intellectual property, and you’d be rightfully irritated if someone was on the conference circuit slavishly copying your talks, so don’t do it to photographers and visual artists.

Luckily, it’s actually pretty easy to be a responsible person about this when you assemble a presentation. Images come from three legitimate sources:

  • Pictures you took yourself
  • Pictures that you paid for (Getty, stock photo providers)
  • Pictures that are licensed for re-use that you handle properly

Taking your own pictures is sometimes fun, and you can do a lot of interesting things with just your camera phone. With reasonable lighting, you can get snapshots that will look OK at conference screen size.

Buying the rights to pictures is expensive, but if you work for or are representing a large organization, they may already have a license that you can use. You can also find some relatively low-cost stock photograph suppliers, but if you’re a 60-slides kind of speaker, it’s still a lot of money.

The thing I do most often is use images that are licensed for re-use, and make sure I attribute them properly, either on the slide itself or at the end of the presentation.  I mostly use the advanced Google image search. In fact, if you are in Google Slides and you say you want to insert an image and then search for it, you will get a search tuned for images that are licensed “Commercial Reuse with Modification”. Knock yourself out, just remember to attribute them properly. I have other ways to search for licensed-for-reuse images in the Resources section.

Code examples

Not all technical talks have code. I promise that’s allowed, and I have never had code in one of my talks and yet they still let me up on stage and give me a mic.

If you do use code in a talk, here are some tips you might find useful:

  • Make the font bigger. No bigger than that. Are the characters the size of your head? Maybe big enough.
  • Use syntax highlighting. It reduces cognitive load at lets people focus on what you’re trying to say.
  • If you’re talking about a particular section, make it bright and the other parts a bit greyed out. Or leave them out all together. After all, you’re not going to teach anyone to code from a stage, you’re just showing them that a thing is possible. They’ll look it up later.
  • Murphy’s Law loves a livecoder. Although some people are virtuousic typists, lots of us get clumsy when we’re nervous. Instead of livecoding, consider taking a series of screenshots and stepping through them as slides. That way the wifi is not an issue, the remote servers are not an issue, and you reduce several variables.

Videos, sound, and other weird things

It may work. From my observation, your odds are 50/50 of getting a video with sound to work with the A/V system, especially if you surprise the technician. Do you really want to unclip your mic and hold it up to your laptop speakers awkwardly? I didn’t think so.

The medium most likely to work in a presentation is an animated gif, which is nice as far as it goes, but you need to make sure it doesn’t just…run…continously….behind you, because dang, those things are hypnotic. I have made this mistake myself, but nothing is as hypnotic as the slide I saw that was just a video of a massive domino layout, with spirals and collapsing towers, and…. yeah. It was a great illustration of the point, which was about how nothing works perfectly, but then we were all watching the looping dominoes, looking for the parts where the connections failed. Great illustration, but it completely distracted the entire audience. It’s possible to set gifs to only loop once or twice, and I strongly recommend you do that for your presentations.

Other weird things that sometimes work and sometimes just make the speaker more nervous or the audience less attentive:

  • Selfies with the audience
  • Physical humor
  • Open coffee mugs, beer steins, or anything that can dump itself on your laptop
  • Asking for audience response if you haven’t prepared them.


My new favorite resource for slide templates: Slides Carnival

A good roundup of CC0 and public domain image sites: WPTavern image resources

And of course, almost everything on Wikipedia has a reuse license, and if you go to the image page, you can get different sizes and a handy little attribution slug.

Museums, such as The British Museum and libraries, such as The New York Public Library are also working very hard to digitize and publish their collections.

A large WWII cargo ship about to slide down the dock

It’s easy to find pictures you can use, such as this great launch photo of a Liberty Ship.

 To sum up

There is no one Right Way to create presentation slides. It’s very dependent on personality and topic. I once saw a great talk that had hand-drawn slides, and one that had 8-bit art. It worked for those speakers and pulled the talk together in a memorable way. But the way you do it will come to be your style, so feel free to experiment while you’re learning. Just don’t stop learning!

Remember that slides are not the whole presentation – your topic and delivery are a huge part of it, too. Provide essential information on every slide. Do your best to be ethical in your use of intellectual property. Be careful of things that may fail and fluster you on stage. Go have fun!

Post on How to nail a tech writer job inteview

I was excited to write another post for

How to nail a tech writer job interview

It’s an extension of a twitter rant I went on a bit ago, about how important it is to know certain things about interviewing as a technical writer, but no one is teaching them, from what I can tell.

  • What kind of samples should you offer?
  • What is ethical to use as a sample?
  • What is a reasonable (and unreasonable) writing test?
  • How do you talk about team projects or work for hire?

I’m already in discussions with the team about a follow-up, based on my New Sheriff In Town talk. That one will cover how to walk into a new position and figure out what to write first.

If you’re interested in more content on interviewing, Carol Smith and I are putting the finishing touches on a workshop about all the not-code parts of a technical interview. You’ll be able to read more about that in The Recompiler soon, and of course, I’ll post about it here.


Screen capture of the front page of a site using folksonomy

Human Indexing

One of the consequences of cheap storage is that we have no incentive to reduce the amount of data we keep. We believe that we’re going to be able to search for whatever we want. But we are starting to exceed our human capacity to filter the returned results of our searches. Algorithms will help us some, but they are never going to be enough as long as we keep adding data at this rate. We’re going to have to use metadata to help us find anything.


For centuries, we have organized data taxonomically. All living creatures have a place in the Linnean structure. I am a chordate, a mammal, a primate, a human. There are lots of library organization systems that organize books and information. For example, the Library of Congress system has categories for everything, and room left for categories of knowledge we haven’t found yet.

Taxonomy has several problems, though. For one thing, you can only sort something into one bin at a time. Constraining things into only one category means you may not think to look in the correct places and assume that the information doesn’t exist.

Taxonomy also replicates the power structures of the culture that makes the organizational, which can be problematic. For example, there was a time when the Library of Congress system categorized Native Americans as fauna of North America. It’s always going to be dangerous to underrepresented groups to encode knowledge as the domain of a dominant group. And even if it’s not dangerous, it’s othering and discouraging.


The solution that has been arising lately is called folksonomy. Folksonomy is assigning tags to a piece of knowledge, based on the contributions of many individuals. You’re already using folksonomy in more places than you know. For example, Amazon recommendations uses both taxonomy and folksonomy to drive recommendations, and so does Netflix. We use it as individuals many times we are trying to describe something that we don’t know the exact name of.

The value of folksonomy is that it doesn’t depend on a top-down organizational system. Instead, it builds an understanding and description of something from many different perspectives. Like the parable of the six blind men and the elephant, the elephant may seem very different to you based on your perspective. For example, records of the Salem Witch Trials might be about early American history, or about Puritans, or about the suppression of women’s expression, or about the racism involved. The records don’t change, but their category and description does, according to who you are, what your perspective is, and what you are seeking to teach or understand.

The downside of folksonomy is that humans are sometimes terrible, and if you leave any system of public contribution completely open, people will attempt to spoil it or use their voice to dominate it. Imagine if every instance of a book that contained Christianity or a reference to communion got deluged with hundreds of tags about cannibalism. It would become difficult to find anything that was actually about cannibalism, and most people looking for documents about Christianity would find their results pushed farther down.

The other problem with folksonomy is one of standardization. If people can freely enter tags, they will enter very slight variants. Not intentionally, but it’s just extremely hard for anyone to be perfectly consistent, let alone many people. If tags vary, then they are not useful for grouping things together, because “King Arthur” and “Arthur, King of the Britons” will not be returned together, even though he is the same historical person. Conversely, “Shaun” might return you an adorable claymation sheep or a zombie-fighting slacker. You’d need to disambiguate them to get clear results.

Tag-Wrangling and Transformative Works

The organization that I have seen balance these conflicting values best is called Archive of Our Own. AO3 is a non-profit organization devoted to hosting transformational works. Transformational works are anything that takes a source text and tells a new story, or alters the text to convey a new point of view. Because US copyright law is not yet very clear about transformational works, this is also united with an academic journal and a lobbying movement to promote the idea of transformational works as a legitimate artistic expression.

Warning: Although nothing in this post is explicit, the AO3 site hosts many sexually explicit works. I do not recommend visiting the site on a monitored system or if you choose to avoid sexually explicit material.

AO3 uses a combination of taxonomy and folksonomy to make hundreds of thousands of fanfic stories available to people who are looking for them. Users can search by story title or author, or they can step through to increasingly fine categories, such as Movies->Marvel->Captain America->Winter Soldier. Stories are then further sorted on pairings, if there is a romantic element.

Screen capture of the front page of a site using folksonomy

AO3 fandom sorting page

Authors can give stories tags. These tags frequently convey authorial commentary, or appeal to a specific in-group of readers. They also function in a descriptive role. For example, a story might be tagged:

  • Odin’s A+ Parenting
  • Thor/Hulk
  • Thor/Banner
  • Loki (Marvel)
  • Thor (Marvel)
  • Hulk (Marvel)
  • LGBTQ Character
  • Humor
  • Drama
  • Curtain fic
  • Longfic

These tags are a mix of authorial description and comment (Odin’s A+ Parenting, Curtain fic), and “canonical tags”, which are standardized tags which describe specific characters or media sources (Loki (Marvel)). If you just searched on “Loki”, you would get both the Marvel version and the mythology version. “Python” returns both Monty Python and snakes.

Searching “loki python” gives you a relatively small overlap.

Tagging also includes warnings about types of content that people find distressing. Being able to select tags away means that people will never be accidentally exposed to something they work to avoid. The community has agreed on a general set of things that will either be explicitly warned for or the whole story will be assigned “Author Chose Not To Use Warning Tags”, which means there could be anything in there, take your own risks.

The canonical tags are managed by a small army of volunteers known as “tag wranglers”. These amazing humans standardize common tags and character and source tags, and also group together similar tags. For example, if you click on the tag/link “Odin’s A+ Parenting”, you’ll see a Tag Page:

The tag wranglers have collated similar and related tags and grouped them together so that reader’s searches have more chance of success. For example, I’m American, so when I am sarcastically assigning an excellent grade, I say “A+”. But British fic-readers might instead say, “Odin’s A-level parenting”. They mean the same thing, but you would have to have an EXTREMELY well-trained machine learning system to link that, or you would need humans. Tag wranglers also work to maintain and standardize warning tags.

The Art of Indexing

One of the early influences on me as a technical writer, before I knew technical writing was a thing, was reading Kurt Vonnegut’s Cat’s Cradle. Mostly, it’s a story of apocalyptic greed, but there’s a throwaway passage:

It appeared that Clair Minton, in her time, had been a professional indexer. I had never heard of such a profession before. She told me that she had put her husband through college years before with her earnings as an indexer, that the earnings had been good, and that few people could index well.

The idea of that passage stuck with me, the thought that indexing is an art worth paying for. When I became a technical writer, I was pretty junior, and ended up with the boring assignment of generating indexes. FrameMaker made this easy enough by parsing out headings to create index entries, and I even had a tool that would permute the headings for me, so I would get entries for Shaving the Cat and Cat, Shaving. But I was unsatisfied with that result, because it was just about the headings, and sometimes I was missing key concepts, so I had to go through and do manual indexing anyway.

So often, when we say indexing now, what we mean is a concordance, not an index. A concordance is a straightforward listing of all the places a word or phrase appears in a document. An index is more carefully constructed, and only points to useful instances of a phrase or word, like introductions or significant mentions. A really good index will also include words that never appear in a document.

User language is not our language

For example, if I show you this picture

Windows Blue Screen of Death

you probably identify it as the Blue Screen of Death. But until about 5 years ago, you would not find that phrase if you searched the Microsoft website. If you had the problem, you would need to search on “fatal exception”. It’s deeply unsatisfying to look for help and not find it. Even now, you won’t find a Microsoft help page about it on the first page of Google results.

If we want to serve our users, we need to meet them where they are, using the language that they use. So if they call something a spinning beach ball or a blue screen of death or whatever, use it in addition to the name that you call it.

The point of indexing something is to make your product and documentation easier for people to use. If you are not using the language that they use, you’re only writing the documentation and index for people who already work for you.

You won’t be able to imagine the names that people call your product (hopefully good). You’ll find their words in the places that they are helping each other. Stack Overflow, user groups, mailing lists, the files and reports that your own support team keeps.

Once you’ve collected all the language that people are using, you need to roll it into your index. You don’t want to identify every instance of a phrase, only the ones that actually pertain to the answers people are looking for. You want people to get the answer, not a deluge of partially relevant information.

Attach the index tags to the place with the best solutions, and also the meta tags that you want to include in the index. For example, if you have a heading called Activating your Thromdimbulator, you will want the following index tags associated with it:

  • Activate thromdimulator
  • Turn on thromdimbulator
  • Activate thingy
  • Turn on thingy

That’s the human indexing effect. There is an index entry that has zero words in common with the heading but will still be exactly what someone needs if they think of their thromdimbulator as a “thingy” that they want to “turn on”.

Human indexing is hard, because it takes time and knowledge and deep product and industry knowledge. But the reward is that even in our search-oriented, automatically-generated world, an excellent index is going to set your product apart.

This post is an expansion on a lightning talk I gave at PyConAU 2016 and Confoo Montreal 2017.

Documentation Debt

It was a good question, and a lot of people jumped in to answer it. I’ve given a talk on avoiding product design debt — That’s what #7fights is. But I haven’t given one explicitly about avoiding debt in documentation.

To understand the problem of avoiding debt, you have to understand the economy you’re working in. Many of the companies I’ve worked with over the years identify documentation as something they need to do to reduce support costs and manage the problem of training internal people. It’s not exactly glamorous. Some companies do value and invest in docs, but it’s a minority. That means that documentation is constantly underfunded, and depending on how much discipline people have about retiring old software products, it means that writers are supporting an increasingly large corpus of text. I’ve worked on projects that were literally thousands of pages long.

The debt comes down to three parts:

  • Format
  • Legacy
  • Unknown unknowns


Format is whatever your tool stack is, and how the documentation is packaged and presented. If, many years ago, someone decided you were a FrameMaker shop, well, you’re probably still a FrameMaker shop, albeit one that can now output “FrameML” or DITAFrame. But the soul of the documentation, the way it is stored is in a proprietary file format that only the priestly class of technical writers can access with relatively expensive tools.

The only way to move something out of a format like that is to strip it to plain text and then re-code all the meta-text like headings and pictures. We call it a conversion project, but it’s just as much fun as changing any other codebase on the fly while trying to still deliver things. Theoretically, there are conversion tools and companies that make that less painful, but I have never seen one work properly, and I’ve been through at least 7 conversion projects. Wailing, gnashing of teeth, and a permanent repetitive stress injury were the usual style of results.

Some parts of the industry, notably Read the Docs, Corilla, Gnome/Mallard, and other open-source-y people have been trying to treat this problem by using extremely flat files, in Sphinx or Markdown. These are essentially text files with very minimal markup that can then be parsed by a variety of different engines. It’s a good theory, but it moves the locus of control from the software knowing how to wrap text and manage hyperlinks to the writer needing to know how to do so using a constrained palette of commands. Thus, the markdown technical writer needs to know their product, and the basics of technical communication, and ALSO a syntax, instead of an interface. Not really ONE syntax, but several, with nuances and subtleties between them. The writer must be a coder.

I hang out with a lot of writer/coders, but I also keep in touch with the rest of the industry, and not everyone wants to do or be that person. They want to put more of their career development into writing and understanding their product, less into the means of production.

The debt here is the very nature of everything that is around your text. The storage, the format, the numbering, the images (oh, IMAGES, the worst kind of debt), they all contribute to what you will have to pay down.


Aside from the problem of legacy formats, you have the problem of legacy information. In 5 years, almost all of what I am writing now will be wrong or outdated. That’s just the nature of things. But if we continue to support the software, then we need to continue to support the documentation that makes the software work. This means that in an older company, you may be maintaining documentation for products that are fifteen years old. And it is maintenance. You have to update them to have the new format, new branding, whatever you change for the new documents has to be projected back to the old documents. If you move from FrameMaker to MadCap Flare, you have a conversion problem again. And it’s not like you can just ignore it and republish old PDFs – there WILL be a reason you have to get into that document to fix something.

I really recommend making your company and product name a variable that you can easily and universally replace. Just trust me on that. It will happen.

So the older your company is, the more documentation debt you have, just like with a codebase that is several layers of abstraction balanced on an OS/390 system kept in the basement.

Unknown Unknowns

The worst kind of debt is the kind that you can’t accurately predict, but when I thought about just leaving my list two items long, I realized that this third type is the kind that has gotten me in the most trouble over my career. I can predict how long it will take to convert x-many pages from one format to another, and what percentage of my time we will have to spend updating dead documents with new logos. What I can’t predict are the documentation debts that come out of left field. The merger that brings you a new set of docs and tooling and writers to integrate. The drop-everything crisis response to a serious security breach. The training that no one realized needed to be written. Those things are almost impossible to reject, and yet they can scramble a tidy documentation roadmap all to hell.

Mitigating Documentation debt

Here are my suggestions for reducing your documentation debt as much as possible. None of them will completely rescue you from the problem, but they are all things I’ve either implemented or tried to talk people into for debt reduction.


Whatever tool you pick, make sure that it exports your intellectual property in a usable format — you don’t want PDFs, you want text, or XML, or in a pinch, HTML. Have you ever had that frustrating moment when someone sends you a Photoshop file and not a PNG and you can’t open it because you don’t have freakin’ expensive Photoshop?

Yeah, don’t end up feeling like that about your documentation. You must always be able to extract something that could be reduced to text by means of a simple script. Your software can store it in a weird relational database for all you care, as long as there is an exit scenario.


Rigidly use universal resources for common items. Your logo? There should only be one file location for that, and the software should retrieve it every time it builds the document. That way you don’t have to change it 4 places per document (my life this week), but rather change the file in one place, keep the name the same, and all your branding is refreshed, poof!

The same goes for templates, fonts, product names, and anything else that is going to be true across multiple documents or a large document.

If you’ve chosen software that is not sophisticated enough to handle universal variables and referenced images, do what you can to imitate them. Code it yourself or get some help, but never have local images of your logo scattered throughout your files. That’s just like burning money.


Retire documents. The same way we say, “Hey, IE 7 still technically runs and we know some of you are locked in by third-party software, but you’re out of support”, we need to give our products and our documents a lifespan. “Hey, this document is 8 years old. We are no longer updating it, but you can download the PDF and good luck to you, traveller from the past!” Probably you shouldn’t be that snarky, though. No one using super old software is happy about it, they just have constraints that we don’t in shiny new-MacbookPro-land.*


Images are the most heinous form of debt. Schematics and conceptual drawings are pretty much fine, but screenshots are a personal nightmare. They’re wrong almost instantly, expensive to translate, prone to rebranding, frequently unreadable, and everyone likes to put them in because they find it reassuring to have pictures.

My suggestion is to use as few screenshots as humanly possible, and to give them names that correspond with the page name, NOT the step they appear in. If you’re lucky you’ll be able to re-use them, but numbering with any kind of sequencing is an act of madness, because there will always be insertions and deletions.

But mostly, don’t use screenshots.

If you have to, I’m excited about this idea for automating capture with Python:


Automate as much as you can. Automate production, using the same build tools as the coders. Automate screen captures if you can. If there’s anything that can be scripted or regularized, do it. This applies to the writing itself. If you can create a template that helps people give you more usable input, it’s worth a couple days of your effort. For example, you could create templates for bug reports, for user-visible changes for release notes, for change requests, for steps.

Never accept a set of typed commands as instructions if you can distribute a script instead. Fingers are fallible.

If you’re looking for inspiration on how to automate documentation and get to the part where you’re only writing the interesting juicy bits, look to API documentation, especially OpenAPI and They are automating away all the tedious parts of the API reference doc creation and giving us more time to write about WHY someone would want to use this call over that one. THat is the ideal of automation – not doing less work, but doing less useless work.

Very few “professional-grade” writing toolsuites work on Macs. Your writer is not any happier about this than you are, but it’s the way it works. That said, if anyone wants to torture me by getting me a Macbook Air that won’t run my software, feel free.

I hope these tips for reducing your documentation debt help you make some good arguments in your organization. I’ll keep thinking on the topic and maybe someday it will be a talk.