Automating As-Built Documentation… Documentation as Code? Documentation as a Service? Documenting Infrastructure.

Documentation can be repetitive, boring, costly, incomplete and error prone.  When the next engineer in your organisation comes along to generate more documentation on a similar subject, often, they start again and waste all prior investment of effort whilst possibly blissfully unaware it even existed.  Not a wise investment in time and money.

Scope:  there are many systems that need documenting and each have their own requirements, tools and peculiarities.  This post assumes an infrastructure slant on documentation standards/requirements.

Documentation cobwebs

What about all those once loved documents that have been created, and served a useful purpose at one stage of their lives, however, over the years have fallen into disrepair and neglect.  The systems they once accurately curated have evolved, adapted or disappeared… people forgot the documentation even existed as they made changes to the infrastructure but hurriedly move onto the next urgent thing.

How about the documentation that nobody ever looks at… denies its existence.  Documentation that sits in a documentation store,  one of many documentation stores with new and old versions scattered far and wide.

What about the comfortable colleague who doesn’t want to excerpt a little effort to learn how to do something in their job… documentation may as well not exist.

Sometimes it’s easy to wonder why bother with good documentation when far too often it feels like a wasted cause!

 

Why good documentation matters

  • Customer may want to see the documentation for the solution you’ve built, or, manage for them.
  • Our intrepid explorer colleague who likes to know how things work, and, takes the initiative and time to learn for themselves very much appreciates the labors of good documentation.
  • May actually help diagnose a problem or issue faster.
  • Whilst painstakingly writing the documentation, perhaps the author catches mistakes in the documentation or design.

 

Is there a better way?  Maybe.

What about;

  • Documentation re-use (gold templates)
    • Let’s not waste so much effort creating the documentation… re-use as much as possible from templates.
  • Documentation source control
    • Collaborate with others to build the awesome gold template and scripts to auto populate
  • Documentation auto population
    • Much time is wasted by engineers transposing table information from an environment to a document.
    • Non-searchable copy and pastes of tables (images) into the document make indexing and searching impossible
    • Post change documentation automatic update
  • A customer view vs an internal view
    • A customer may want more detail as to what each component does, whilst, a colleague dealing with similar solutions all day long doesn’t care for the vendor marketing material (marchitecture)or technology descriptions.
  • Automatic/schedule documentation updates
  • Infrastructure as Code
    • To some degree can be self documenting.  Even better if it contains good comments.

Tool chain

So many options exist when it comes to documentation.  I looked at a couple with a view to knocking something up quickly:

  • Confluence
    • Seemed like a good option which the possibility of styling a pages PDF output.
    • Didn’t invest the time in working out how to integrate external data sources into a page, but, seemed do-able via Dynamic Content Macro.
  • Tex/LaTex
    • Whilst powerful and very popular among many technical professions for technical documentation, it seemed like there was a high learning curve.
  • Markdown
  • reStructuredText & sphinx
    • Was my preferred option for a while there, however, ran into issues generating PDF documents and I didn’t want to spend the time troubleshooting.
    • Really liked the sphinx HTML output generated from python script docstring comments
  • AsciiDoc (language) & AsciiDoctor (processor)
    • Settled on AsciiDoctor as it did everything I wanted and seemed to just work, especially with asciidoctor-pdf.
  • Pandoc
    • The swiss army knife for converting between various documentation types.
  • DocBook

Source controlled gold template structure

The directory structure is important to consider as it would enable the ability to collaborate, house chapter based text, refer to images and hold information gathering scripts.  Below is a possible directory structure to nicely split out static text, images and dynamic content.

├── chapters

│   └── includes
│   └── chapter_01
│   ├── host_info.csv
│   └── mem_info.csv
├── img
│   ├── includes
│   │   ├── chapter_01
│   │   │   └── sample-diagram.png
│   ├── document-footer-image.png
│   └── document-title-image.png
├── main.adoc
├── pdf-theme.yml
├── README.md
└── scripts (maintain scripts in a separate repository and import as required)
└── query_hosts.ps1 (example)

Further consideration would be required for sharing content between different gold template repositories.  In fact, the above structure worked well for generating PDF documents out of the source, however, when editing the AsciiDoc code with Atom or Visual Code Studio with various markup tools enabled, the HTML view present was broken due to image paths.

 

Great, gold templates in git.  What now?

Once you have a new project that needs documenting,  clone your gold template and create a new repository to host the documentation you’re about to create.

Modify the new repository to meet your needs and execute some of the helper scripts to gather required environment data,  commit and push your changes to the new repository and now build/compile your documentation.

Modify the new repository documentation as desired, to meet your requirements.

 

Seems like a lot of effort.  Why not use a word processor as we normally do?

How about day two?  Your documentation and environment align at the time of creation (hopefully),  what happens weeks,  months,  years into existence?  Hopefully the documentation is not gathering dust!

If it is,  setup an automated scheduled task to gather the current state information for you and compile the up-to-date documentation.

Perhaps your monitoring system provides you with estimated capacity exhaustion information… maybe that’s useful in your documentation if sharing with a customer on a regular basis.  (Perhaps better in some sort of reporting document)

What if you had an “infrastructure as code” setup and stored the code in a code repository.  How nice would it be to update documentation when configuration is updated in your code repository,  perhaps as a webhook or CI/CD pipeline.

Documentation as a Service

How about you don’t want all your engineers to install the documentation tool chain required to produce the final version of the documentation.

Consider a small web app which accepts the following inputs:

  • Code repo URL
  • Code repo username & password
  • Email address, or, target location (document store)

Process flow for web app:  Submit params -> Clone Repo -> Build/Process Documentation -> Preview, email or upload final document.  Display errors if any.

 

Observation

I thought there would have been more discussion around documentation as code, or, automatically generating documentation.  Perhaps documenting infrastructure automatically is not as glamorous as other topics?

Maybe it’s some what redundant if you have self documenting infrastructure as code?  But, probably falls short as this only covers the “what” of a solution design and not the “why”.

 

Links/Interesting reading

Ansible auditing callback plugin – Infrastructure testing “as code”

I came across a project at work recently which was developing some code to audit the state of an environment.  At the same time I was starting to play around with Ansible at home to define my personal systems desired state as code.  Being new to Ansible,  I wondered what it would take to audit my home systems using Ansible.  I thought I’d have a go at writing the same thing out of hours and compare how the two solutions differed once complete.

From a beginners perspective, it seemed like it would be pretty easy.  The basic playbooks I had previously built returned OK’s, CHANGED and FAILED.  I figured an auditing script only needed to return OK’s and FAIL’s… at this point I hadn’t thought of the issue I was about to run into.

The default way Ansible operates is that it will stop processing the playbooks when an error is returned from a task.  This allows you to fix the problem and restart where you left off.  As it’s perfectly fine for an audit script to have failed tests, the default way Ansible worked was no good for my purpose here.

 

Work around

There was a way around this.  ‘ignore_errors: yes’ in a task allowed an Ansible playbook to continue processing if it found errors.  The only problem for me was that this resulted in any failed tasks being reported as OK in the end of playbook statistics.

The other thing that was throwing out the end of playbook statistics was that when using shell or command modules in the task, Ansible counts this as changed (system state was changed) and reports this in the statistics.  Using ‘changed_when: false’ helped on command/shell tasks, however, the ignore_errors issue still existed.

I really like Ansible… how could I work around the issue I was facing?

 

Develop a plugin

As Ansible was returning OK for tasks that FAILED when ignored_errors was set,  the statistics were skewed at the end of the playbook.  I decided to persevere with Ansible and wondered how to work around this.  I delved into the Ansible Architecture overview / developer guide.  It seemed like I’d need to store my own statistics to overcome the hurdle of ignore_failed statistics.

I came across Ansible Modules and Plugins.

Modules are essentially telling Ansible how to connecting to your node/device and configure it’s desired state.

Plugins allow you to alter how Ansible works.

From the Developer Guide:

The following types of plugins are available:

  • Action plugins are front ends to modules and can execute actions on the controller before calling the modules themselves.
  • Cache plugins are used to keep a cache of ‘facts’ to avoid costly fact-gathering operations.
  • Callback plugins enable you to hook into Ansible events for display or logging purposes.
  • Connection plugins define how to communicate with inventory hosts.
  • Filters plugins allow you to manipulate data inside Ansible plays and/or templates. This is a Jinja2 feature; Ansible ships extra filter plugins.
  • Lookup plugins are used to pull data from an external source. These are implemented using a custom Jinja2 function.
  • Strategy plugins control the flow of a play and execution logic.
  • Shell plugins deal with low-level commands and formatting for the different shells Ansible can encounter on remote hosts.
  • Test plugins allow you to validate data inside Ansible plays and/or templates. This is a Jinja2 feature; Ansible ships extra test plugins.
  • Vars plugins inject additional variable data into Ansible runs that did not come from an inventory, playbook, or the command line.

The Callback plugins allowed me to hook into Ansible events for the events I wanted to track.  For example:It looked like developing a Callback plugin was my best bet to capture OK/FAILED/SKIPPED/UNREACHABLE tasks and store statistics about them.


def runner_on_failed(self, host, data, ignore_errors=False):
"""Routine for handling runner (task) failures"""
...

def runner_on_ok(self, host, data):
"""Routine for handling runner (task) successes"""
...

def runner_on_skipped(self, host, item=None):
"""Routine for handling skipped tasks"""
...

def runner_on_unreachable(self, host, data):
"""Routine for handling unreachable hosts"""
...

These would allow me to keep track of the number of times any of these events fired and which host they were for.

 

 

Preparing to run the Ansible audit

To use the audit callback plugin, you’ll need:

  • Ansible 2.3.0.0 & 2.3.1.0 (tested)
    • Other versions may work
  • Python 2.7 (tested)

Then, download the callback plugin from: https://github.com/Im0/ansible-audit

Either write a sample playbook, or, download an example from here:  https://github.com/Im0/ansible-audit-playbook-example

 

Running the Ansible audit script

The easiest way to run an audit would be to clone the audit git repository and copy the audit.py and *.jinja files into the ansible plugins directory.  In my case the plugins directory resides here: /usr/local/lib/python2.7/dist-packages/ansible/plugins/callback/

The plugins directories may reside in other places as well.  For example, the default configuration file for Ansible (although commented out) refers to /usr/share/ansible/plugins/

git clone https://github.com/Im0/ansible-audit.git
sudo cp ansible-audit/audit* /usr/local/lib/python2.7/dist-packages/ansible/plugins/callback/
git clone https://github.com/Im0/ansible-audit-playbook-example.git
AUDIT_NAME='My Audit' CUSTOMER="Test Customer" ANSIBLE_CALLBACK_WHITELIST=audit ansible-playbook --ask-sudo-pass ansible-audit-playbook-example/site.yml -k

 

Checking the output

By default the plugin creates and outputs a zip file into: /var/log/ansible/audits/

The zip file contains a JSON file and two HTML files which contain the results.

 

Fruit salad output

 

More detail output

 

Todo

There are a few things I want to tidy up, such as:

  • Colour on UNREACHABLE hosts on the fruit salad output needs to be red.
  • Always output the zip file full path on exit
  • Add some error handling around the output directory creation
  • Update instructions to include output directory requirements
  • Tweak output filename
  • Consider sending output via email
  • Add options to select method of delivering output (ie. send via email, to zip file or just JSON output)
  • Windows examples in example playbook