Coding | Stuff

I came across a project at work recently which was developing some code to audit the state of an environment. At the same time I was starting to play around with Ansible at home to define my personal systems desired state as code. Being new to Ansible, I wondered what it would take to audit my home systems using Ansible. I thought I’d have a go at writing the same thing out of hours and compare how the two solutions differed once complete.

From a beginners perspective, it seemed like it would be pretty easy. The basic playbooks I had previously built returned OK’s, CHANGED and FAILED. I figured an auditing script only needed to return OK’s and FAIL’s… at this point I hadn’t thought of the issue I was about to run into.

The default way Ansible operates is that it will stop processing the playbooks when an error is returned from a task. This allows you to fix the problem and restart where you left off. As it’s perfectly fine for an audit script to have failed tests, the default way Ansible worked was no good for my purpose here.

Work around

There was a way around this. ‘ignore_errors: yes’ in a task allowed an Ansible playbook to continue processing if it found errors. The only problem for me was that this resulted in any failed tasks being reported as OK in the end of playbook statistics.

The other thing that was throwing out the end of playbook statistics was that when using shell or command modules in the task, Ansible counts this as changed (system state was changed) and reports this in the statistics. Using ‘changed_when: false’ helped on command/shell tasks, however, the ignore_errors issue still existed.

I really like Ansible… how could I work around the issue I was facing?

Develop a plugin

As Ansible was returning OK for tasks that FAILED when ignored_errors was set, the statistics were skewed at the end of the playbook. I decided to persevere with Ansible and wondered how to work around this. I delved into the Ansible Architecture overview / developer guide. It seemed like I’d need to store my own statistics to overcome the hurdle of ignore_failed statistics.

I came across Ansible Modules and Plugins.

Modules are essentially telling Ansible how to connecting to your node/device and configure it’s desired state.

Plugins allow you to alter how Ansible works.

From the Developer Guide:

The following types of plugins are available:

Action plugins are front ends to modules and can execute actions on the controller before calling the modules themselves.
Cache plugins are used to keep a cache of ‘facts’ to avoid costly fact-gathering operations.
Callback plugins enable you to hook into Ansible events for display or logging purposes.
Connection plugins define how to communicate with inventory hosts.
Filters plugins allow you to manipulate data inside Ansible plays and/or templates. This is a Jinja2 feature; Ansible ships extra filter plugins.
Lookup plugins are used to pull data from an external source. These are implemented using a custom Jinja2 function.
Strategy plugins control the flow of a play and execution logic.
Shell plugins deal with low-level commands and formatting for the different shells Ansible can encounter on remote hosts.
Test plugins allow you to validate data inside Ansible plays and/or templates. This is a Jinja2 feature; Ansible ships extra test plugins.
Vars plugins inject additional variable data into Ansible runs that did not come from an inventory, playbook, or the command line.

The Callback plugins allowed me to hook into Ansible events for the events I wanted to track. For example:It looked like developing a Callback plugin was my best bet to capture OK/FAILED/SKIPPED/UNREACHABLE tasks and store statistics about them.

def runner_on_failed(self, host, data, ignore_errors=False): """Routine for handling runner (task) failures""" ...

def runner_on_ok(self, host, data): """Routine for handling runner (task) successes""" ...

def runner_on_skipped(self, host, item=None): """Routine for handling skipped tasks""" ...

def runner_on_unreachable(self, host, data): """Routine for handling unreachable hosts""" ...

These would allow me to keep track of the number of times any of these events fired and which host they were for.

Preparing to run the Ansible audit

To use the audit callback plugin, you’ll need:

Ansible 2.3.0.0 & 2.3.1.0 (tested)
- Other versions may work
Python 2.7 (tested)

Then, download the callback plugin from: https://github.com/Im0/ansible-audit

Either write a sample playbook, or, download an example from here: https://github.com/Im0/ansible-audit-playbook-example

Running the Ansible audit script

The easiest way to run an audit would be to clone the audit git repository and copy the audit.py and *.jinja files into the ansible plugins directory. In my case the plugins directory resides here: /usr/local/lib/python2.7/dist-packages/ansible/plugins/callback/

The plugins directories may reside in other places as well. For example, the default configuration file for Ansible (although commented out) refers to /usr/share/ansible/plugins/

git clone https://github.com/Im0/ansible-audit.git sudo cp ansible-audit/audit* /usr/local/lib/python2.7/dist-packages/ansible/plugins/callback/ git clone https://github.com/Im0/ansible-audit-playbook-example.git AUDIT_NAME='My Audit' CUSTOMER="Test Customer" ANSIBLE_CALLBACK_WHITELIST=audit ansible-playbook --ask-sudo-pass ansible-audit-playbook-example/site.yml -k

Checking the output

By default the plugin creates and outputs a zip file into: /var/log/ansible/audits/

The zip file contains a JSON file and two HTML files which contain the results.

Fruit salad output

More detail output

Todo

There are a few things I want to tidy up, such as:

Colour on UNREACHABLE hosts on the fruit salad output needs to be red.
Always output the zip file full path on exit
Add some error handling around the output directory creation
Update instructions to include output directory requirements
Tweak output filename
Consider sending output via email
Add options to select method of delivering output (ie. send via email, to zip file or just JSON output)
Windows examples in example playbook

It appears that the Plesk backup utility (pleskbackup Linux) writes to a temporary file prior to moving it to the final destination you’ve specified on the command line.

ie. When running:

/usr/local/psa/bin/pleskbackup all /mnt/nfsshare/pleskbackup.bak

Plesk writes the whole backup file to the local disk in /var/lib/psa/dumps/tmp/:

# ls -lah /var/lib/psa/dumps/tmp/
total 1.9G
drwx—— 2 psaadm psaadm 4.0K 2009-02-09 18:38 .
drwxr-xr-x 3 psaadm psaadm 4.0K 2009-02-09 10:17 ..
-rw-r–r– 1 rootÂ Â rootÂ Â 1.9G 2009-02-09 18:44 fileTFzx5u

This is a problem when you’re low on disk space on the machine and you’re mounting a NFS drive to backup to… it obviously fills the disk quickly and the backup fails.

Note:

Restoration of the backup files created by the script below has not yet been tested
Disk may still fill up due to very large site… may have to symlink the psa tmp directory onto the NFS mount
Tested with psabackup from Plesk version 8.2.1
It turned out that the site used over 60% of available disk space so filled the disk every time.Â Plus, Plesk backup/restore on Linux has known issues for site backup files larger than 2GB in size.Â Will have to write a manual backup script to overcome this.

The script looked like below, but, will need to be updated to backup sites too large for pleskbackup to handle.Â This may or may not work for you depending on the Plesk version you are using… use at own risk.

#!/bin/bash

dopleskbu () {
MYSQLPASS=`cat /etc/psa/.psa.shadow`
for DOMAIN in `mysql -Ns -uadmin -p$MYSQLPASS -Dpsa -e “select name from domains”`;
do /usr/local/psa/bin/pleskbackup domains $DOMAIN –exclude=DOMAIN_IF_YOU_WANT $FPATH.$DOMAIN.bak;
if [ $? == 0 ]; then
logger “$0 $DOMAIN backup complete”
else
logger “$0 $DOMAIN backup error”
fi
done;
}

FILENAME=pleskbackup.weekly
DIR=/mnt/backup
FPATH=$DIR/$FILENAME

# Main calls

dopleskbu
### END OF SCRIPT ###

Category: Coding

Ansible auditing callback plugin – Infrastructure testing “as code”