ELK Stack with Vagrant and Ansible

      No Comments on ELK Stack with Vagrant and Ansible

It has been a while unfortunately since I sat down for some writing on this blog. But the writing bug is persistent – once hooked, you got to write. Serious writing requires original research, data collection/analysis etc… so can take a good bit of time depending on the topic. I had been playing with ELK on a routine basis so for what I thought to be a quick win, I decided to add to the earlier blog post on Building elasticsearch clusters with Vagrant . Well, it did not quite turn out that way and I had to cover a good bit of ground and publish code to other repos in order for this blog to be useful.

To recap, that post used (a) Virtualbox as the means to build the VMs for the cluster, and (b) a shell script to orchestrate the installation & configuration of an elasticsearch cluster on those VMs.  In this post we will still use Virtual box for giving us the VMs, but enhance the provisioning in 2 ways.

  1. We will build a full ELK stack where appication logs are shiipped by Beats to a Logstash host for grokking and posting to an ES cluster hooked to Kibana for querying & dashboards. Here is a schematic.
  2. The provisioning (install & config) of the software for each of E (Elasticsearch), L (Logstash),  K (Kibana) and the Filebeat plugin is done via ansible playbooks. Why? While provisioning with shell scripts is very handy, it is programmatic, can get long and winded for building complex coupled software systems across a cluster of hosts. Ansible hides much of that and in stead presents more or a less a declarative way (playbooks!) of orchestrating the provisioning. While there are alternatives, ansible has become insanely popular lately in the devops world.

=> Download from github to play along with the build out.

1. The Inventory

We need 7 VMs – 2 for applications with Filebeat, 1 ES master node, 2 ES data nodes,  and 1 each for Logstash, Kibana. The names, ip addresses for these VMs will be needed both by Vagrant for creating these and, by Ansible later for provisioning. So we prepare a single inventory file and use it with both Vagrant & Ansible.  Further, this file rations the cpu/memory resources on my 8-core, 16GB memory laptop across these 7 Vms. The file is simply  YAML that is processed in RUBY by Vagrant & in PYTHON by Ansible. Our file looks like:

2. The Vagrantfile

The Vagrantfile below builds each of the 7 Vms as per the specs in the inventory.

The vms are created simply with   vagrant up --no-provision and the cluster is provisioned with Ansible.

3. The Playbook

The  main playbook simple, delegating the specific app provisioning to roles, while overriding some defaults as needed. We override the port variables in the main playbook so we can see they match up as per our schematic for the cluster. Some other variables are overridden in group_vars/* files to keep them from cluttering the main playbook. The cluster is provisioned with

ansible-playbook -i inventory.yml elk.yml The directory layout shows a glimpse of all that is under the hood.
Common variables for all the host groups are specified in groups_vars/all.yml. The variable ‘public_iface‘ can vary depending on the VM provider. For vagrant here it is “eth1”. We use that to pull out the IP address of the host from ansible_facts whenever required in the playbook.

3.1 Elasticsearch

The provisioning of elasticsearch on master & data nodes is delegated to the excellent role elastic.elasticsearch published by elastic.co.  As the role allows for multiple instances of ES on a host, we name the instances as “{{cluster_http_port}}_{{cluster_transport_port}}” which would be a unique identifier. The ES cluster itself is taken to be defined by this pair of ports that are used by all the master/data members of the cluster. If we rerun the playbook with a separate pair say 9202 & 9302 we will get a second cluster ‘9202_9302’ (in addition to ‘9201_9301’ that we do here on the first run) on the same set of hosts, and all would work fine.

The master node configuration variables are in group_vars/es-master-nodes.json. The key useful thing here is in the highlighted lines 5, 12 & 13 where we derive ”discovery.zen.ping.unicast.hosts’ and ‘network.host’ settings for elasticsearch from the information in inventory file.

The data node configuration variables are very similar in group_vars/es-data-nodes.json. The highlighted lines show the only changes.

3.2 Logstash

Logstash is provisioned with the role ashokc.logstash. The default variables for this role are overridden with group_vars/logstash-nodes.yml. Lines 4-5 specify the user & group that own this instance of logstash. Line 9 derives the elasticsearch urls from the inventory file. It will be used for configuring elasticsearch output sections.

A simple elasticsearch output config & filebeat input config are enabled with:

3.3 Kibana

Kibana is provisioned with the role ashokc.kibana. The default variables for this role are again overridden with group_vars/kibana-nodes.yml. Unlike logstash it is quite common to run multiple kibana servers on a single host with each instance targeting a separate ES cluster. This role allows for that and identifies the Kibana instance with the port it is running at (Line # 7). Lines 2, and 3 specify the owner/group for the instance.

The template file for ‘kibana.yml‘ below picks up the correct elasticsearch cluster url.

3.4 Filebeat

Filebeat is provisioned with the role ashokc.filebeat The default variables are overridden with groups_vars/filebeat-nodes.yml.  Line 5 figures out the logstash connection to use.

Line #14 in the template for the sample filebeat.yml configures the output to our logstash host at the right port.

4. Logs

The last step would be to run an application on the filebeat nodes and watch the logs flow into Kibana.  Our application would simply be a perl script that writes the log file /tmp/custom.log. We login to each of the filebeat hosts and run the following Perl script.

The corresponding sample logstash config file for processing this log would be placed at roles/ashokc.logstash/files/custom-filter.conf

Conclusion

By placing appropriate filter files for logstash at roles/ashokc.logstash/files and prospector config file for filebeat at roles/ashokc.filebeat/templates/filebeat.yml.j2, one can use this ELK stack to analyze application logs. A variety  of extensions are possible, for example enabling X-PACK login/security, other distributions & versions for ‘ashokc’ roles, automated testing etc… But then there is always more to be done, isn’t there?

Leave a Reply