Ansible is known for it's simplicity. To put it short: Ansible playbooks are just tasks executed on hosts and sprinkled with variables. Some of these variables change the way Ansible behaves as a whole and the rest of them are left for user. They are usually organized into variables assigned to hosts and variables assigned to groups. Host variables are boring, since we can't practically reuse them. Group variables are where the fun begins. Most projects have a set of global configuration settings like address of LDAP server, user accounts etc. However when introducing groups some problem emerges: what, if we want to combine variables from different groups or group and host?
Let's say that you have Ansible role for provisioning user accounts. You're smart, so you don't import it separately for each account and instead use loops. You have a common set of project accounts, but suddenly third party contractor from outside joins project and requires access to only one host. Of course you can hardcode additional account creation in playbook and call it a day, but it will be hide & seek game for future maintainers. Variables are supposed to express project's configuration. Ad-hoc additions to roles blurs separation between reusable modules and configuration.
I've used accounts in above example, but I hope you get the idea: we have
role for managing certain service, common set of resources for this service
and unique additions assigned more granularly to hosts. You may say that there's
nothing wrong in reapplying single role for host, but what if it's
authorized_keys
file for SSH we're managing and we don't want any stray keys?
It means that role has exclusive control other some service and reapplying it
with different variables would erase prior configuration. That's when first
questions about merging and combining variables are asked.
This problem is not new, so let's review some existing solutions.
Since variables have Python-compatible data types, nothing stops us from using Jinja2 filters for transformation. In Ansible we have:
Using set theory filters and combine allows to render configuration based on
more than one variable. This way instead of system_accounts
variable we can
have system_accounts_host
and system_accounts_global
pair. They can be
merged upon using role like so:
- name: Setup system accounts
import_role:
name: system_accounts
vars:
system_accounts: "{{ system_accounts_global | union(system_accounts_host) }}"
This way we have to modify playbook in order to respect host/group specific variables. It'll be especially cumbersome when we have multilevel configuration with child groups like subregions or availability zones in datacenter.
hash_behaviour
is probably one of lesser known Ansible configuration settings,
because it has potential to break a lot of things. The idea is that by default
Ansible variables are replaced in specified order, but you can change this
behaviour to merging instead. It seems like the most obvious solution, but:
hash_behaviour
is applied globally. A lot of code relies on default
behaviour, so expect problems.include_vars
ignore this setting anyway.It's handy to know that such option exists, but it's too limited to be actually useful. Puppet solved this issue better with lookup_options which allow to specify merge behaviour per variable.
At least 2 plugins for handling variables merging exist:
hash_behaviour
approach.Adding plugin has also one obvious downside: it's yet another dependency which needs to be downloaded for each installation and may become unmaintained some day. Of course it hugely depends on project. Some plugins have been consistently maintained for long years despite small user-base.
It works similarly to ansible-merge-vars, but doesn't require any plugin.
Have you ever heard of varnames
lookup plugin? It's actually mentioned in
documentation
as an alternative to hash_behaviour
. It allows us to query defined variables
names using regular expressions. It means that we no longer have to hardcode
variables names like in approach with combine
/union
, but we still need to
make extra names for variables. We will construct variable's value based on a
few other variables matching expression. Jinja2 macros for doing exactly this
can be written like so:
{% macro merge_list(pattern) -%}
{% set ns = namespace(output=[]) -%}
{% for name in lookup('varnames', pattern).split(',') -%}
{% set ns.output = ns.output | union(lookup('vars', name)) -%}
{% endfor -%}
{{ ns.output }}
{% endmacro -%}
{% macro merge_dict(pattern, recursive=True) -%}
{% set ns = namespace(output={}) -%}
{% for name in lookup('varnames', pattern).split(',') -%}
{% set ns.output = ns.output | combine(lookup('vars', name), recursive=recursive) -%}
{% endfor -%}
{{ ns.output }}
{% endmacro -%}
Of course we don't want to copy this much of code every time we want to create
merged variable. Instead let's save these macros in templates/macros.j2
file
in our project. Some conventions are required to make our playbook easier to
maintain. Let's say that for every variable x
partial variables shall be named
according to pattern <id>__x__m
, where <id>
is group or host name and __m
suffix stands for merged
. We can create merged variable like so:
all__system_groups__m:
jakski: {}
system_groups: >
{% from 'templates/macros.j2' import merge_dict with context -%}
{{ merge_dict('__system_groups__m$') }}
Now, if we wanted to add a few more groups which are specific to host prod1
:
prod1__system_groups__m:
prod-app: {}
backups: {}
Above snippet can be placed in host_vars
, but nothing stops us from using this
approach in group_vars
or even creating merged variable based on partials from
multiple groups.
It's not exactly quick & easy, but this way we rely only on builtin plugins and Jinja2, so there's no problem with dropping it into an already working project and use only for specified variables.