Cookie Consent by Free Privacy Policy Generator website

How to document data?

Rick Radewagen
April 13, 2022

minute read

Without documentation, systems fail and progress is lost.

So documentation is important, but how should it be done?

Let’s take a look at general principles and then dive into specific actions in the context of data and analytics.


Documenting anything

The purpose of documenting is not just to get read, but also to structure your thoughts.

Here it helps to imagine an ideal reader that acts as a mental sparring partner. This ideal reader might be you.

“Documentation is a love letter you write to your future self”

— Damian Conway

In school, we are often taught to write complicated words and long sentences to impress people with our good education. This is something you should unlearn.

The purpose of writing is to connect with people. More complicated words and longer sentences require more concentration from the reader. One person, who got famous for his simple writing is Ernest Hemmingway.

“Up in that room, I decided that I would write one story about each thing that I knew about.”

— Ernest Hemingway

You see how short and simple the words are. Pay also attention to the message. He was writing about the things he knew. When you document something this is good advice. Don’t overthink and make documentation a research project. Try to write about a system, when you know most about it. For an engineer that is usually right after programming.

The state of the system is in your mind, write it down.

In many organizations documentation is not considered “real” work that contributes directly to business outcomes. Nobody gets paid to write internal love letters.

Here it is important that managers prioritize long-term goals (e.g. system stability) and good practices (e.g. collaboration) over short-term goals (just get it done).

One way to prioritize documentation is to make it part of the career ladder. To get a promotion, you don’t just need to write good code, you need to design well-understood and stable systems.  If fast-moving companies like Uber and Stripe find time for documentation, your organization should consider it as well.

“We're saying documentation is important. Not only for your team, but for your career.”

— Stephanie Blotner, Tech Lead Manager Uber

Tip #1: Use simple words and write short sentences. [1]

Tip #2: Write to structure your thoughts

Tip #3: Have an ideal reader in mind to tailor your message

Tip #4: Write about the things you know

Tip #5: Make documentation part of career progession

Documenting data

The goal of documenting data is to help users to find it, understand it, and be confident to use it.

So you don’t need to document everything.

Well-modeled data is often self-explanatory.

“Good code documents itself” contains a good amount of truth. You should aim to have expressive and consistent names that are self-descriptive. A database called ‘dbo’ is not helpful, ‘sales_prod’ is better, especially if there is also a ‘sales_dev’ and a ‘finance_prod’ database.

So documenting starts with naming things, but it does not stop there.

Use the explicit hierarchy of the database system to build top-down documentation.

Like that users start to understand the big picture and will be able to navigate your data landscape themselves.

Document the top 3 levels (system, database, schema) completely. Focus on documenting the top 10% of the most used tables. Establish a process that all new tables/views/models should be created with at least minimal documentation. (Remember: during creation, it is the easiest to document [2])

Documenting all columns is usually only worth it for data products or widely used reporting tables. But for these, you should be rigorous. If a column is not worth documenting, it should not be part of the table.

In practice, it can be hard to choose the right words. Should I refer to customer or account or company or user or site? Does everybody understand the acronyms we use in our team?

To tackle challenges like these your documentation system ideally supports you with a glossary, where you can define important terms once and reference them in the documentation. [3]

Tip #6: Use expressive and consistent names

Tip #7: Document top-down and most used

Tip #8: One sentence is usually enough

Tip #9: Make documentation part of the development process

Tip #10: Use #definitions in a business glossary

Outcome

Documentation helps you to think clearly and share these clear thoughts with the rest of your team. If you follow these 10 tips the results of your work will be amplified and it will be fun to work with data at your organization. 😊

Footnotes

[1]  Check your writing with https://hemingwayapp.com/

[2] With Snowflake you can use the comment clause, when creating a table. https://docs.snowflake.com/en/sql-reference/sql/create-table.html

[3] At Snowboard, we use the #hastag notation that show the definition of a term, when hovering over it.