ELK Stack with Vagrant and Ansible

It has been a while unfortunately since I sat down for some writing on this blog. But the writing bug is persistent – once hooked, you got to write. Serious writing requires original research, data collection/analysis etc… so can take a good bit of time depending on the topic. I had been playing with ELK on a routine basis so for what I thought to be a quick win, I decided to add to the earlier blog post on Building elasticsearch clusters with Vagrant . Well, it did not quite turn out that way and I had to cover a good bit of ground and publish code to other repos in order for this blog to be useful.

To recap, that post used (a) Virtualbox as the means to build the VMs for the cluster, and (b) a shell script to orchestrate the installation & configuration of an elasticsearch cluster on those VMs. In this post we will still use Virtual box for giving us the VMs, but enhance the provisioning in 2 ways.

We will build a full ELK stack where appication logs are shiipped by Beats to a Logstash host for grokking and posting to an ES cluster hooked to Kibana for querying & dashboards. Here is a schematic.
The provisioning (install & config) of the software for each of E (Elasticsearch), L (Logstash), K (Kibana) and the Filebeat plugin is done via ansible playbooks. Why? While provisioning with shell scripts is very handy, it is programmatic, can get long and winded for building complex coupled software systems across a cluster of hosts. Ansible hides much of that and in stead presents more or a less a declarative way (playbooks!) of orchestrating the provisioning. While there are alternatives, ansible has become insanely popular lately in the devops world.

=> Download from github to play along with the build out.

1. The Inventory

We need 7 VMs – 2 for applications with Filebeat, 1 ES master node, 2 ES data nodes, and 1 each for Logstash, Kibana. The names, ip addresses for these VMs will be needed both by Vagrant for creating these and, by Ansible later for provisioning. So we prepare a single inventory file and use it with both Vagrant & Ansible. Further, this file rations the cpu/memory resources on my 8-core, 16GB memory laptop across these 7 Vms. The file is simply YAML that is processed in RUBY by Vagrant & in PYTHON by Ansible. Our file looks like:

es-master-nodes:
  hosts:
    es-master-1:                    # hostname
      ansible_host: 192.168.33.25   # ip address
      ansible_user: vagrant
      memory: 2048                  # ram to be assigned in MB
      ansible_ssh_private_key_file: .vagrant/machines/es-master-1/virtualbox/private_key

es-data-nodes:
  hosts:
    es-data-1:
      ansible_host: 192.168.33.26
      ansible_user: vagrant
      memory: 2048
      ansible_ssh_private_key_file: .vagrant/machines/es-data-1/virtualbox/private_key

    es-data-2:
      ansible_host: 192.168.33.27
      ansible_user: vagrant
      memory: 2048
      ansible_ssh_private_key_file: .vagrant/machines/es-data-2/virtualbox/private_key

kibana-nodes:
  hosts:
    kibana-1:
      ansible_host: 192.168.33.28
      ansible_user: vagrant
      memory: 512
      ansible_ssh_private_key_file: .vagrant/machines/kibana-1/virtualbox/private_key

logstash-nodes:
  hosts:
    logstash-1:
      ansible_host: 192.168.33.29
      ansible_user: vagrant
      memory: 1536
      ansible_ssh_private_key_file: .vagrant/machines/logstash-1/virtualbox/private_key

filebeat-nodes:
  hosts:
    filebeat-1:
      ansible_host: 192.168.33.30
      ansible_user: vagrant
      memory: 512
      ansible_ssh_private_key_file: .vagrant/machines/filebeat-1/virtualbox/private_key

    filebeat-2:
      ansible_host: 192.168.33.31
      ansible_user: vagrant
      memory: 512
      ansible_ssh_private_key_file: .vagrant/machines/filebeat-2/virtualbox/private_key

2. The Vagrantfile

The Vagrantfile below builds each of the 7 Vms as per the specs in the inventory.

require 'rbconfig'
require 'yaml'

DEFAULT_BASE_BOX = "bento/ubuntu-16.04"
cpuCap = 10                                   # Limit to 10% of the cpu
inventory = YAML.load_file("inventory.yml")   # Get the names & ip addresses for the guest hosts
VAGRANTFILE_API_VERSION = '2'

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vbguest.auto_update = false
  inventory.each do |group, groupHosts|
    next if (group == "justLocal")
    groupHosts['hosts'].each do |hostName, hostInfo|
      config.vm.define hostName do |node|
        node.vm.box = hostInfo['box'] ||= DEFAULT_BASE_BOX
        node.vm.hostname = hostName                                       # Set the hostname
        node.vm.network :private_network, ip: hostInfo['ansible_host']    # Set the IP address
        ram = hostInfo['memory']                                          # Set the memory
        node.vm.provider :virtualbox do |vb|
          vb.name = hostName
          vb.customize ["modifyvm", :id, "--cpuexecutioncap", cpuCap, "--memory", ram.to_s]
        end
      end
    end
  end
end

The vms are created simply with vagrant up –no-provision and the cluster is provisioned with Ansible.

3. The Playbook

The main playbook simple, delegating the specific app provisioning to roles, while overriding some defaults as needed. We override the port variables in the main playbook so we can see they match up as per our schematic for the cluster. Some other variables are overridden in group_vars/* files to keep them from cluttering the main playbook. The cluster is provisioned with

ansible-playbook -i inventory.yml elk.yml

- hosts: es-master-nodes
  become: true
  roles:
    - { role: elastic.elasticsearch, cluster_http_port: 9201, cluster_transport_tcp_port: 9301}

- hosts: es-data-nodes
  become: true
  roles:
    - { role: elastic.elasticsearch, cluster_http_port: 9201, cluster_transport_tcp_port: 9301}

- hosts: kibana-nodes
  become: true
  roles:
    - { role: ashokc.logstash, kibana_server_port: 5601, cluster_http_port: 9201 }

- hosts: logstash-nodes
  become: true
  roles:
    - { role: ashokc.logstash, cluster_http_port: 9201, filebeat_2_logstash_port: 5044 }

- hosts: filebeat-nodes
  become: true
  roles:
    - {role: ashokc.filebeat, filebeat_2_logstash_port: 5044 }

The directory layout shows a glimpse of all that is under the hood.

.
├── elk.yml
├── group_vars
│ ├── all.yml
│ ├── es-data-nodes.json
│ ├── es-master-nodes.json
│ ├── filebeat-nodes.yml
│ ├── kibana-nodes.yml
│ └── logstash-nodes.yml
├── inventory.yml
├── roles
│ ├── ashokc.filebeat
│ ├── ashokc.kibana
│ ├── ashokc.logstash
│ └── elastic.elasticsearch
└── Vagrantfile

Common variables for all the host groups are specified in groups_vars/all.yml. The variable ‘public_iface‘ can vary depending on the VM provider. For vagrant here it is “eth1”. We use that to pull out the IP address of the host from ansible_facts whenever required in the playbook.

public_iface: eth1   # For Vagrant Provider
elk_version: 5.6.1
es_major_version: 5.x
es_apt_key: https://artifacts.elastic.co/GPG-KEY-elasticsearch
es_version: "{{ elk_version }}"
es_apt_url: deb https://artifacts.elastic.co/packages/{{ es_major_version }}/apt stable main

3.1 Elasticsearch

The provisioning of elasticsearch on master & data nodes is delegated to the excellent role elastic.elasticsearch published by elastic.co. As the role allows for multiple instances of ES on a host, we name the instances as “{{cluster_http_port}}_{{cluster_transport_port}}” which would be a unique identifier. The ES cluster itself is taken to be defined by this pair of ports that are used by all the master/data members of the cluster. If we rerun the playbook with a separate pair say 9202 & 9302 we will get a second cluster ‘9202_9302’ (in addition to ‘9201_9301’ that we do here on the first run) on the same set of hosts, and all would work fine.

The master node configuration variables are in group_vars/es-master-nodes.json. The key useful thing here is in the highlighted lines 5, 12 & 13 where we derive ”discovery.zen.ping.unicast.hosts’ and ‘network.host’ settings for elasticsearch from the information in inventory file.

{
  "es_java_install" : true,
  "es_api_port": "{{cluster_http_port}}",
  "es_instance_name" : "{{cluster_http_port}}_{{cluster_transport_tcp_port}}",
  "masterHosts_transport" : "{% for host in groups['es-master-nodes'] %} {{hostvars[host]['ansible_'+public_iface]['ipv4']['address'] }}:{{cluster_trans
port_tcp_port}}{%endfor %}",
  "es_config": {
    "cluster.name": "{{es_instance_name}}",
    "http.port": "{{cluster_http_port}}",
    "transport.tcp.port": "{{cluster_transport_tcp_port}}",
    "node.master": true,
    "node.data": false,
    "network.host": ["{{ hostvars[inventory_hostname]['ansible_' + public_iface]['ipv4']['address'] }}","_local_" ],
    "discovery.zen.ping.unicast.hosts" : "{{ masterHosts_transport.split() }}"
  }
}

The data node configuration variables are very similar in group_vars/es-data-nodes.json. The highlighted lines show the only changes.

{
  "es_data_dirs" : "/opt/elasticsearch",
  "es_java_install" : true,
  "es_api_port": "{{cluster_http_port}}",
  "es_instance_name" : "{{cluster_http_port}}_{{cluster_transport_tcp_port}}",
  "masterHosts_transport" : "{% for host in groups['es-master-nodes'] %} {{hostvars[host]['ansible_'+public_iface]['ipv4']['address'] }}:{{cluster_trans
port_tcp_port}}{%endfor %}",
  "es_config": {
    "cluster.name": "{{es_instance_name}}",
    "http.port": "{{cluster_http_port}}",
    "transport.tcp.port": "{{cluster_transport_tcp_port}}",
    "node.master": false,
    "node.data": true,
    "network.host": ["{{ hostvars[inventory_hostname]['ansible_' + public_iface]['ipv4']['address'] }}","_local_" ],
    "discovery.zen.ping.unicast.hosts" : "{{ masterHosts_transport.split() }}"
  }
}

3.2 Logstash

Logstash is provisioned with the role ashokc.logstash. The default variables for this role are overridden with group_vars/logstash-nodes.yml. Lines 4-5 specify the user & group that own this instance of logstash. Line 9 derives the elasticsearch urls from the inventory file. It will be used for configuring elasticsearch output sections.

es_java_install: True
update_java: False
logstash_version: "{{ elk_version }}"
logstash_user: logstashUser
logstash_group: logstashGroup
logstash_enabled_on_boot: yes
logstash_install_plugins:
  - logstash-input-beats
esMasterHosts: "{% for host in groups['es-master-nodes'] %} http://{{hostvars[host]['ansible_'+public_iface]['ipv4']['address'] }}:{{cluster_http_port}}
{% endfor %}"
logstash_es_urls : "{{ esMasterHosts.split() }}"

A simple elasticsearch output config & filebeat input config are enabled with:

output {
  elasticsearch {
    hosts => {{ logstash_es_urls | to_json }}
  }
}

input {
  beats {
    port => {{filebeat_2_logstash_port}}
  }
}

3.3 Kibana

Kibana is provisioned with the role ashokc.kibana. The default variables for this role are again overridden with group_vars/kibana-nodes.yml. Unlike logstash it is quite common to run multiple kibana servers on a single host with each instance targeting a separate ES cluster. This role allows for that and identifies the Kibana instance with the port it is running at (Line # 7). Lines 2, and 3 specify the owner/group for the instance.

kibana_version: "{{ elk_version }}"
kibana_user: kibanaUser
kibana_group: kibanaGroup
kibana_enabled_on_boot: yes
kibana_server_host: 0.0.0.0
kibana_elasticsearch_url : http://{{hostvars[groups['es-master-nodes'][0]]['ansible_'+public_iface]['ipv4']['address'] }}:{{cluster_http_port}}
kibana_instance: "{{kibana_server_port}}"

The template file for ‘kibana.yml‘ below picks up the correct elasticsearch cluster url.

server.port: {{ kibana_server_port }}
server.host: {{ kibana_server_host }}
elasticsearch.url: {{ kibana_elasticsearch_url }}
pid.file: {{ kibana_pid_file }}
logging.dest: {{ kibana_log_file }}

3.4 Filebeat

Filebeat is provisioned with the role ashokc.filebeat The default variables are overridden with groups_vars/filebeat-nodes.yml. Line 5 figures out the logstash connection to use.

filebeat_version: "{{ elk_version }}"
filebeat_enabled_on_boot: yes
filebeat_user: filebeatUser
filebeat_group: filebeatGroup
logstashHostsList: "{% for host in groups['logstash-nodes'] %} {{hostvars[host]['ansible_'+public_iface]['ipv4']['address'] }}:{{filebeat_2_logstash_por
t}}{% endfor %}"
filebeat_logstash_hosts: "{{ logstashHostsList.split() }}"

Line #14 in the template for the sample filebeat.yml configures the output to our logstash host at the right port.

filebeat.prospectors:
- type: log
  enabled: true
  paths:
   - /tmp/custom.log
  fields:
    log_type: custom
    type: {{ansible_hostname}}
    from: beats
  multiline.pattern: '^\s[+]{2}\scontinuing .*'
  multiline.match: after
output.logstash:
  hosts:
    {{ filebeat_logstash_hosts | to_nice_yaml }}

4. Logs

The last step would be to run an application on the filebeat nodes and watch the logs flow into Kibana. Our application would simply be a perl script that writes the log file /tmp/custom.log. We login to each of the filebeat hosts and run the following Perl script.

#!/usr/bin/perl -w
use strict ;
no warnings 'once';
my @codes = qw (fatal error warning info debug trace) ;
open(my $fh, ">>", "/tmp/custom.log") ;
$fh->autoflush(1);
my $now = time();
for my $i (1 .. 100) {
  my $message0 = "Type: CustomLog: This is a generic message # $i for testing ELK" ;
  my $nDays = int(rand(5)) ;
  my $nHrs = int(rand(24)) ;
  my $nMins = int(rand(60)) ;
  my $nSecs = int(rand(60)) ;
  my $timeValue = $now - $nDays * 86400 - $nHrs * 3600 - $nMins * 60 - $nSecs ;
  my $now1 = localtime($timeValue) ;
  my $nMulti = int(rand(10)) ;
  my $message = "$now1 $nDays:$nHrs:$nMins:$nSecs $nMulti:$codes[int(rand($#codes))] $message0" ;
  if ($nMulti > 0) {
    for my $line (1 .. $nMulti) {
       $message = $message . "\n ++ continuing the previous line for this log error..."
    }
  }
  print $fh "$message\n" ;
}
close $fh ;

The corresponding sample logstash config file for processing this log would be placed at roles/ashokc.logstash/files/custom-filter.conf

filter {
  if [fields][log_type] == "custom" {
    grok {
      match => [ "message", "(?<matched-timestamp>\w{3}\s+\w{3}\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2}\s+\d{4})\s+(?<nDays>\d{1,3}):(?<nHrs>\d{1,2}):(?<nMi
ns>\d{1,2}):(?<nSecs>\d{1,2})\s+(?<nLines>\d{1,2}):(?<code>\w+) Type: (?<given-type>\w+):[^#]+# (?<messageId>\d+)\s+%{GREEDYDATA}" ]
      add_tag => ["grokked"]
      add_field => { "foo_%{nDays}" => "Hello world, from %{nHrs}" }
    }
    mutate {
      gsub => ["message", "ELK", "BULK"]
    }
    date {
      match => [ "timestamp" , "EEE MMM d H:m:s Y", "EEE MMM  d H:m:s Y" ]
      add_tag => ["dated"]
    }
  }
}

Conclusion

By placing appropriate filter files for logstash at roles/ashokc.logstash/files and prospector config file for filebeat at roles/ashokc.filebeat/templates/filebeat.yml.j2, one can use this ELK stack to analyze application logs. A variety of extensions are possible, for example enabling X-PACK login/security, other distributions & versions for ‘ashokc’ roles, automated testing etc… But then there is always more to be done, isn’t there?