Jakski's blog

Combining variables in Ansible

2022-05-25

Ansible is known for it's simplicity. To put it short: Ansible playbooks are just tasks executed on hosts and sprinkled with variables. Some of these variables change the way Ansible behaves as a whole and the rest of them are left for user. They are usually organized into variables assigned to hosts and variables assigned to groups. Host variables are boring, since we can't practically reuse them. Group variables are where the fun begins. Most projects have a set of global configuration settings like address of LDAP server, user accounts etc. However when introducing groups some problem emerges: what, if we want to combine variables from different groups or group and host?

Why would you want to combine Ansible variables?

Let's say that you have Ansible role for provisioning user accounts. You're smart, so you don't import it separately for each account and instead use loops. You have a common set of project accounts, but suddenly third party contractor from outside joins project and requires access to only one host. Of course you can hardcode additional account creation in playbook and call it a day, but it will be hide & seek game for future maintainers. Variables are supposed to express project's configuration. Ad-hoc additions to roles blurs separation between reusable modules and configuration.

I've used accounts in above example, but I hope you get the idea: we have role for managing certain service, common set of resources for this service and unique additions assigned more granularly to hosts. You may say that there's nothing wrong in reapplying single role for host, but what if it's authorized_keys file for SSH we're managing and we don't want any stray keys? It means that role has exclusive control other some service and reapplying it with different variables would erase prior configuration. That's when first questions about merging and combining variables are asked.

Known solutions

This problem is not new, so let's review some existing solutions.

Using Jinja2 filters

Since variables have Python-compatible data types, nothing stops us from using Jinja2 filters for transformation. In Ansible we have:

combine - Allows to merge dictionaries/mappings including deep merge.
union and other set theory filters - Allow to concatenate lists.

Using set theory filters and combine allows to render configuration based on more than one variable. This way instead of system_accounts variable we can have system_accounts_host and system_accounts_global pair. They can be merged upon using role like so:

- name: Setup system accounts
  import_role:
    name: system_accounts
  vars:
    system_accounts: "{{ system_accounts_global | union(system_accounts_host) }}"

This way we have to modify playbook in order to respect host/group specific variables. It'll be especially cumbersome when we have multilevel configuration with child groups like subregions or availability zones in datacenter.

hash_behaviour configuration

hash_behaviour is probably one of lesser known Ansible configuration settings, because it has potential to break a lot of things. The idea is that by default Ansible variables are replaced in specified order, but you can change this behaviour to merging instead. It seems like the most obvious solution, but:

hash_behaviour is applied globally. A lot of code relies on default behaviour, so expect problems.
Some modules like include_vars ignore this setting anyway.
It was deprecated once and documentation strongly advises to avoid using it.

It's handy to know that such option exists, but it's too limited to be actually useful. Puppet solved this issue better with lookup_options which allow to specify merge behaviour per variable.

Using plugins

At least 2 plugins for handling variables merging exist:

Symaxion/ansible-common - Looks nice, but suffers similar issues to hash_behaviour approach.
leapfrogonline/ansible-merge-vars - Requires extra task for merging variables.

Adding plugin has also one obvious downside: it's yet another dependency which needs to be downloaded for each installation and may become unmaintained some day. Of course it hugely depends on project. Some plugins have been consistently maintained for long years despite small user-base.

Yet another approach

It works similarly to ansible-merge-vars, but doesn't require any plugin.

Have you ever heard of varnames lookup plugin? It's actually mentioned in documentation as an alternative to hash_behaviour. It allows us to query defined variables names using regular expressions. It means that we no longer have to hardcode variables names like in approach with combine/union, but we still need to make extra names for variables. We will construct variable's value based on a few other variables matching expression. Jinja2 macros for doing exactly this can be written like so:

{% macro merge_list(pattern) -%}
  {% set ns = namespace(output=[]) -%}
  {% for name in lookup('varnames', pattern).split(',') -%}
    {% set ns.output = ns.output | union(lookup('vars', name)) -%}
  {% endfor -%}
  {{ ns.output }}
{% endmacro -%}

{% macro merge_dict(pattern, recursive=True) -%}
  {% set ns = namespace(output={}) -%}
  {% for name in lookup('varnames', pattern).split(',') -%}
    {% set ns.output = ns.output | combine(lookup('vars', name), recursive=recursive) -%}
  {% endfor -%}
  {{ ns.output }}
{% endmacro -%}

Of course we don't want to copy this much of code every time we want to create merged variable. Instead let's save these macros in templates/macros.j2 file in our project. Some conventions are required to make our playbook easier to maintain. Let's say that for every variable x partial variables shall be named according to pattern <id>__x__m, where <id> is group or host name and __m suffix stands for merged. We can create merged variable like so:

all__system_groups__m:
  jakski: {}
system_groups: >
  {% from 'templates/macros.j2' import merge_dict with context -%}
  {{ merge_dict('__system_groups__m$') }}

Now, if we wanted to add a few more groups which are specific to host prod1:

prod1__system_groups__m:
  prod-app: {}
  backups: {}

Above snippet can be placed in host_vars, but nothing stops us from using this approach in group_vars or even creating merged variable based on partials from multiple groups.

It's not exactly quick & easy, but this way we rely only on builtin plugins and Jinja2, so there's no problem with dropping it into an already working project and use only for specified variables.