Ansible and Ad-hoc Groups: Part 1

Part one of an look into using ad-hoc groups in Ansible.

Groups within Ansible are no mystery to frequent users and are a linchpin to almost all playbooks. Groups are fairly flexible and allow for sorting hosts into meaningful categories, specifying group-specific variables, and even nested groups. Groups are typically managed by the inventory, whether static or dynamic, and are used to classify which hosts will be targetted for a given play. Let’s take a look at a basic inventory with a group:

zk01 ansible_host=10.0.0.30
zk02 ansible_host=10.0.0.31

[zookeeper]
zk01
zk02

A simple playbook, making use of our grouping above, could resemble:

- name: manage the zookeeper servers
  hosts: zookeeper
  roles:
    - zookeeper

Only the hosts in the zookeeper group will get the zookeeper role. Pretty simple stuff.

Let’s take this example a bit further. Zookeeper has two mandatory config options that need to be set: ZOOKEEPER_MYID (node’s ID in the cluster) and ZOOKEEPER_SERVERS (a comma-separated list of server IPs to connect to). At first glance, it may be tempting to simply define these in the inventory:

zk01 ansible_host=10.0.0.30
zk02 ansible_host=10.0.0.31

[zookeeper]
zk01 ZOOKEEPER_MYID=1 ZOOKEEPER_SERVERS=10.0.0.30,10.0.0.31
zk02 ZOOKEEPER_MYID=2 ZOOKEEPER_SERVERS=10.0.0.30,10.0.0.31

This may satisfy our use case, but doesn’t scale and get’s us into a data management problem. Instead, let’s try to derive this information at run time, and let’s start with populating the ZOOKEEPER_SERVERS variable:

- name: manage the zookeeper servers
  hosts: zookeeper
  pre_tasks:
    - name: create zookeeper_ips group
      add_host: >
        hostname={{hostvars[item]['ansible_default_ipv4']['address']}}
        groups=zookeeper_ips
      with_items: play_hosts|sort
      run_once: true

    - name: set ZOOKEEPER_SERVERS
      set_fact: ZOOKEEPER_SERVERS={{groups['zookeeper_ips']|join(',')}}
  roles:
    - zookeeper

What we’ve done here is created an ad-hoc group called zookeeper_ips and populated it with the IPs of all the hosts in the zookeeper group. A few things to note, we use play_hosts instead of groups['zookeeper'] to avoid adding failed/downed hosts, and we make use of the add_host module to determine the IPs of the servers. We also make use of the sort filter to help our play be idempotent. Without the sort, we run the risk of the string changing values without any real reason other than the order changed. This could be especially bad when a change in config causes a restart of a process.

We can’t use something like a with_items loop in the set_fact play because we need to create a list before we can use the join filter. This may also seem like excess work when we could just use {{ groups['zookeeper'] | join(',') }}. By default, the groups['zookeeper'] variable will just return the hostnames of the servers. In our case, we have a requirement to use the IPs and not hostnames, so that option is out.

Let’s now move onto setting the ZOOKEEPER_MYID variable:

- name: manage the zookeeper servers
  hosts: zookeeper
  pre_tasks:
    - name: create zookeeper_ips group
      add_host: >
        hostname={{hostvars[item]['ansible_default_ipv4']['address']}}
        groups=zookeeper_ips
      with_items: play_hosts|sort

    - name: set ZOOKEEPER_SERVERS
      set_fact: ZOOKEEPER_SERVERS={{groups['zookeeper_ips']|join(',')}}

    - name: set ZOOKEEPER_MYID
      set_fact: ZOOKEEPER_MYID={{ item.0 + 1 }}
      when: item.1 == ansible_default_ipv4.address
      with_indexed_items: groups['zookeeper_ips']
  roles:
    - zookeeper

Using the same set_fact module, we are able to determine, dynamically, a unique ID for the zookeeper instance to use. The task starts by looping through the zookeeper_ips group, adding an index. The index is stored as item.0, and the value (in this case the IP of the host) is item.1. A when conditional checks to make sure the IP in the loop matches the host’s target IP. If the IPs do match, we set ZOOKEEPER_MYID to the index + 1. The conditional enforces that the host with the IP in the current iteration gets the ID with the proper index, otherwise every host would get the ID equal to the length of the zookeeper_ips array.

As you can see, ad-hoc groups in ansible can help make your plays more dynamic, reducing the load played on variable/data management. At this point, we could add another zookeeper host and the cluster would be automatically configured to support the additional host. One caveat to this approach is you need to ensure your roles/tasks support a dynamically adjusting cluster. Say we added another host, zk00 with IP 10.0.0.29, we could get into a situation where when the play runs, zk00 get’s the ID of 1 and the others each get bumped. This could cause problems if the cluster requires a lock on a specific ID, in which case, it may be useful to either adjust the logic or set the ID in hostvars.

In the next post, we’ll take a look at utilizing ad-hoc groups for better cluster management.

If you have questions or comments, send us a tweet at @thisendout and if you are interested in learning ansible, check out our training course on Udemy, Mastering Ansible (use coupon code TEOBLOG15 for a discount)!