ELK Clusters on AWS with Ansible

      No Comments on ELK Clusters on AWS with Ansible

In the previous post we built a virtual ELK cluster with Vagrant and Ansible, where the individual VMs comprising the cluster were carved out of a single host. While that allowed for a self-contained development & testing of all the necessary artifacts – it is not a real world scenario. The components of the ELK stack are usually on separate, possibly dedicated hosts. Fortunately this does not mean that we are at square one on our efforts to put up an ELK cluster in these cases. Having used ansible roles for each of the software components earlier, we already have an idempotent and reproducible means to deliver software to hosts. It is the provisioning of the hosts, and the targeting of sub-groups among them for different roles is what would be different, as we change the provisioner from Virtualbox to something else. Here we choose AWS as the host provisioner and devote the bulk of this blog to the mechanics of building the ELK cluster on AWS with Ansible. In the end we touch upon the small modifications needed to our earlier playbook for delivering software to these hosts.

=> Download from github to play along with the build out.

1. The cluster

We prepare a yml file with some information on the type, number of hosts for each group in the ELK cluster, along with some tags that allow us to pull out a specific group of hosts by a tag later for software delivery.

2. Provision Hardware

There are a number of ways we can have hosts (“ec2 instances”) created on AWS for our needs – Aws console UI, Aws CLI, Variety of SDKs, Vagrant, Ansible, etc… Here we opt for Ansible as that will be our means later for delivering software as well via roles to these hosts. So we will have 2 playbooks – provisionHardware.yml  for building a cluster of ssh’able hosts as per the specs in cluster.yml, and the second playbook  provisionSoftware.yml for delivering ELK software to these hosts. This provisionSoftware.yml playbook would be essentially the same as the one we used earlier with Vagrant – save for some minor changes to accommodate AWS Vs Vagrant differences for targeting hosts & ssh’ing to them.

Ansible has a series of excellent ec2 modules that allow us to orchestrate cluster set up on AWS from scratch. These tasks run locally, communicate with AWS via API and spin up the hosts as per the specs. Having an AWS account and credentials in hand for api access is a prerequisite of course. Here are a sequence of steps that we convert to ansible tasks in the playbook.

  1. Set up a (non-default) VPC with an Internet Gateway, and a public subnet with routing.
  2. Set up a security group that allows all communication within that group, and allows access to ports 22 (for ssh) & 5601 (for kibana) from outside ( ansible host)
  3. Generate a key-pair for use with SSH access needed for running provisionSoftware.yml playbook.
  4. Provision the cluster hosts listed in cluster.yml
The playbook is self-explanatory for the most part if you are used to writing ansible playbooks. We need the SSH access key for software provisioning later so we also augment group_vars/all.yml file with:
The credentials for api access are best encrypted, here with ansible-vault. You will need to supply your own key & secret in the file aws-secrets.yml
and encrypt with   ansible-vault encrypt aws-secrets.yml choosing a password that you need to view/use this information. Now you can run the playbook with:
supply the vault password, and get the cluster of hosts provisioned. A few other things that may need some explanation in the playbook are:

  1. Line #11: Make sure to give the IP address of your ansible host. This host will have SSH access to the cluster hosts & access to Kibana
  2. Line #14: The credentials for api access are placed in the file aws-secrets.yml
  3. Line #74: The ansible host is granted access to ports 22 (ssh) & 5601 (kibana)  on all the hosts. There is no app at ‘5601’ on non-Kibana nodes of course, but perhaps there is no need to be picky about this 🙂
  4. Line #91: The generated key-pair is copied to the same location as what is referenced in group_vars/all.yml as the provisionSoftware.yml playbook needs it.
  5. Lines #95 – 109: The cluster.yml file is read into a dictionary and iterated over to provision all the hosts

Dynamic inventory

When working with a cloud provider such as AWS, the hosts will come and go over time. If an instance has been terminated, running the provisionHardware.yml playbook again will re-provision it, but the IP address can be different. So it is best to query AWS for the current inventory details, at the time software is being provisioned. There are scripts available that will readily do this for us. Here are a few steps.

  1. Install the Boto module for your Python, say like: sudo pip install boto
  2. Download ec2.py and ec2.ini . chmod uog+x ec2.py
  3. Run the following shell script
The output will be a json file that ansible can readily work with as an inventory file.

3. Provision Software

With (a) the infrastructure in place,  and (b) a way to get a list of the available hosts, and (c) the ansible roles already developed and tested – all we need to do now is to just run our playbook provisionSoftware.yml against this infrastructure. We have a bit of housekeeping to do however before that to account for the differences between Vagrant/Virtualbox hosts & Aws hosts. The main difference is the need for an extra ‘pre_tasks’ section that should be run before applying the roles.

  1. For ansible modules to run on a target host, that host will need to have the right Python packages. With Vagrant we had chosen a box that was already blessed with those. But the chosen AWS ami may not have it. So, before any ansible modules can run on the AWS host to apply the roles, we need to do a ‘raw install’ of these Python packages on those hosts. This is done by adding ‘pre_tasks’ to provisionSoftware.yml in Lines #1 – #20. The ‘pre_tasks’ run prior any role being applied.
  2. The way we get ‘groups’ of hosts is by using the instance_tags we specified in cluster.yml. The filenames in group_vars should be accordingly changed as well.

As for the contents of the above files, couple of changes are in order with respect to getting the IP address of the host. For example in the files ‘tag_Name_esData.json’ & ‘tag_Name_esMaster.json’:

“masterHosts_transport” : “{% for host in groups[‘es-master-nodes’] %} {{hostvars[host][‘ansible_’+public_iface][‘ipv4’][‘address’] }}:{{cluster_trans port_tcp_port}}{%endfor %}”,

“masterHosts_transport” : “{% for host in groups[‘tag_Name_esMaster’] %} {{hostvars[host][‘ec2_private_ip_address’]}}:{{cluster_transport_tcp_port}}{% endfor %}”,

“network.host”: [“{{ hostvars[inventory_hostname][‘ansible_’ + public_iface][‘ipv4’][‘address’] }}”,”_local_” ],

“network.host”: [“{{ hostvars[inventory_hostname][‘ec2_private_ip_address’] }}”,”_local_” ],

With similar changes made to the other yml files, we are finally ready to provision software to the cluster by running the following script

4. Testing

With the ELK cluster up and running, we can generate some logs on the filebeat hosts and watch them flow into Kibana. For this we can simply do:

  • Find the IP address of a filebeat host and copy genLogs.pl to that host.  Log into that host and run the Perl script.  Replace “xxx.xxx.xxx.xxx” below with the actual IP.
  • Get the IP address of KIbana host and go to the following URL. Replace yyy.yyy.yyy.yyy below with the actual Kibana IP

5. Summary

Our objective was to set up an ELK cluster on AWS. We split that into 2 ansible playbooks –

  1. one for provisioning hardware as per specs, and
  2. the other for provisioning software via previously developed & tested roles

While there may be number of other ways to skin this cat, it looks like we have achieved our objective… Agree or disagree?

Leave a Reply