ELK Clusters on AWS with Ansible

In the previous post we built a virtual ELK cluster with Vagrant and Ansible, where the individual VMs comprising the cluster were carved out of a single host. While that allowed for a self-contained development & testing of all the necessary artifacts – it is not a real world scenario. The components of the ELK stack are usually on separate, possibly dedicated hosts. Fortunately this does not mean that we are at square one on our efforts to put up an ELK cluster in these cases. Having used ansible roles for each of the software components earlier, we already have an idempotent and reproducible means to deliver software to hosts. It is the provisioning of the hosts, and the targeting of sub-groups among them for different roles is what would be different, as we change the provisioner from Virtualbox to something else. Here we choose AWS as the host provisioner and devote the bulk of this blog to the mechanics of building the ELK cluster on AWS with Ansible. In the end we touch upon the small modifications needed to our earlier playbook for delivering software to these hosts.

=> Download from github to play along with the build out.

1. The cluster

We prepare a yml file with some information on the type, number of hosts for each group in the ELK cluster, along with some tags that allow us to pull out a specific group of hosts by a tag later for software delivery.

awsHosts:
  master-nodes:
    instance_type: t2.small # 1 cpu & 2gb ram
    exact_count: 1  # Makes sure that there will only be one Master node
    instance_tags:
      Name: esMaster
    count_tag:
      Name: esMaster

  data-nodes:
    instance_type: t2.medium  # 2 cpus & 4gb ram
    exact_count: 2  # Makes sure that there will be exactly 2 hosts with the tag 'esData'
    instance_tags:
      Name: esData  # Allows us to refer to all data nodes via "tag_Name_esData" later on
    count_tag:
      Name: esData

  logstash-nodes:
    instance_type: t2.small
    exact_count: 1
    instance_tags:
      Name: logstash
    count_tag:
      Name: logstash

  kibana-nodes:
    instance_type: t2.micro # 1 cpu & 1gb ram
    exact_count: 1
    instance_tags:
      Name: kibana
    count_tag:
      Name: kibana

  filebeat-nodes:
    instance_type: t2.micro
    exact_count: 2
    instance_tags:
      Name: filebeat
    count_tag:
      Name: filebeat

awsHosts:

master-nodes:

instance_type: t2.small # 1 cpu & 2gb ram

exact_count: 1 # Makes sure that there will only be one Master node

instance_tags:

Name: esMaster

count_tag:

Name: esMaster

data-nodes:

instance_type: t2.medium # 2 cpus & 4gb ram

exact_count: 2 # Makes sure that there will be exactly 2 hosts with the tag 'esData'

instance_tags:

Name: esData # Allows us to refer to all data nodes via "tag_Name_esData" later on

count_tag:

Name: esData

logstash-nodes:

instance_type: t2.small

exact_count: 1

instance_tags:

Name: logstash

count_tag:

Name: logstash

kibana-nodes:

instance_type: t2.micro # 1 cpu & 1gb ram

exact_count: 1

instance_tags:

Name: kibana

count_tag:

Name: kibana

filebeat-nodes:

instance_type: t2.micro

exact_count: 2

instance_tags:

Name: filebeat

count_tag:

Name: filebeat

2. Provision Hardware

There are a number of ways we can have hosts (“ec2 instances”) created on AWS for our needs – Aws console UI, Aws CLI, Variety of SDKs, Vagrant, Ansible, etc… Here we opt for Ansible as that will be our means later for delivering software as well via roles to these hosts. So we will have 2 playbooks – provisionHardware.yml for building a cluster of ssh’able hosts as per the specs in cluster.yml, and the second playbook provisionSoftware.yml for delivering ELK software to these hosts. This provisionSoftware.yml playbook would be essentially the same as the one we used earlier with Vagrant – save for some minor changes to accommodate AWS Vs Vagrant differences for targeting hosts & ssh’ing to them.

Ansible has a series of excellent ec2 modules that allow us to orchestrate cluster set up on AWS from scratch. These tasks run locally, communicate with AWS via API and spin up the hosts as per the specs. Having an AWS account and credentials in hand for api access is a prerequisite of course. Here are a sequence of steps that we convert to ansible tasks in the playbook.

Set up a (non-default) VPC with an Internet Gateway, and a public subnet with routing.
Set up a security group that allows all communication within that group, and allows access to ports 22 (for ssh) & 5601 (for kibana) from outside ( ansible host)
Generate a key-pair for use with SSH access needed for running provisionSoftware.yml playbook.
Provision the cluster hosts listed in cluster.yml

# ELK setup on AWS

- hosts: localhost
  connection: local
  gather_facts: False

  vars: # variables
    - region: us-west-2
    - image: ami-b9ff39d9 # ubuntu xenial
    - cidr: 172.17.0.0/16 # allows for over 65000 hosts
    - local_ip: zzz.zzz.zzz.zzz     # Replace with the IP address of the ansible host

  vars_files:
    - aws-secrets.yml   # api credentials best protected by ansible-vault
    - cluster.yml # details of the cluster hosts, counts, tags

  tasks:
    - name: Set up VPC for ELK
      ec2_vpc_net:
        name: ELK_VPC
        cidr_block: "{{cidr}}"
        region: "{{region}}"
        tags:
          Name: ELK non-default VPC
      register: elkVpc

    - name: Set up an Internet Gateway for this non-Default VPC
      ec2_vpc_igw:
        vpc_id: "{{elkVpc.vpc.id}}"
        region: "{{region}}"
        state: present
        tags:
          Name: ELK internet gateway
      register: elkIgw

    - name: Create a subnet for ELK within this VPC
      ec2_vpc_subnet:
        state: present
        vpc_id: "{{elkVpc.vpc.id}}"
        cidr: "{{cidr}}"
        map_public: yes
        tags:
          Name: ELK public subnet
      register: elkSubnet

    - name: Set up routing for the subnet
      ec2_vpc_route_table:
        vpc_id: "{{elkVpc.vpc.id}}"
        region: "{{region}}"
        tags:
          Name: ELK route table
        subnets:
          - "{{ elkSubnet.subnet.id }}"
        routes:
        - dest: 0.0.0.0/0
          gateway_id: "{{ elkIgw.gateway_id }}"
      register: public_route_table

    - name: Set up Security Group for ELK
      ec2_group:
        name: ELK_Security_Group
        description: A security group to be used with ELK stack
        vpc_id: "{{elkVpc.vpc.id}}"
        region: "{{region}}"
        aws_access_key: "{{aws_access_key}}"    # from aws-secrets.yml
        aws_secret_key: "{{aws_secret_key}}"    # from aws-secrets.yml
        tags:
          app_group: ELK
        rules:
          - proto: tcp
            ports:
              - 22
              - 5601
            cidr_ip: "{{local_ip}}/32"  # ports 22, and 5601 are allowed ONLY from this ansible host
          - proto: all
            group_name: ELK_Security_Group  # allow all communication across the cluster hosts
        rules_egress:
          - proto: all
            cidr_ip: 0.0.0.0/0
      register: elkSg

    - name: Set up an ELK key 
      ec2_key:  # The key-pair used for ssh access to these hosts
        name: "{{key_pair}}"
        region: "{{region}}"
      register: elkKey

    - name: Save the private key
      copy:
        content: "{{ elkKey.key.private_key }}"
        dest: "~/.ssh/{{key_pair}}.pem" # save the private key
        mode: 0600
      when: elkKey.changed

    - name: Provision a set of instances
      ec2:    # Iterate over the dictionary read from the "cluster.yml" file
        aws_access_key: "{{aws_access_key}}"
        aws_secret_key: "{{aws_secret_key}}"
        key_name: "{{key_pair}}"
        vpc_subnet_id: "{{elkSubnet.subnet.id}}"
        group_id: "{{elkSg.group_id}}"
        region: "{{region}}"
        instance_type: "{{item.value.instance_type}}"
        image: "{{image}}"
        wait: true
        exact_count: "{{item.value.exact_count}}"
        instance_tags: "{{item.value.instance_tags}}"
        count_tag: "{{item.value.count_tag}}"
      with_dict: "{{awsHosts}}"

100

101

102

103

104

105

106

107

108

109

# ELK setup on AWS

- hosts: localhost

connection: local

gather_facts: False

vars: # variables

- region: us-west-2

- image: ami-b9ff39d9 # ubuntu xenial

- cidr: 172.17.0.0/16 # allows for over 65000 hosts

- local_ip: zzz.zzz.zzz.zzz # Replace with the IP address of the ansible host

vars_files:

- aws-secrets.yml # api credentials best protected by ansible-vault

- cluster.yml # details of the cluster hosts, counts, tags

tasks:

- name: Set up VPC for ELK

ec2_vpc_net:

cidr_block: "{{cidr}}"

region: "{{region}}"

tags:

Name: ELK non-default VPC

- name: Set up an Internet Gateway for this non-Default VPC

ec2_vpc_igw:

vpc_id: "{{elkVpc.vpc.id}}"

region: "{{region}}"

state: present

tags:

Name: ELK internet gateway

- name: Create a subnet for ELK within this VPC

ec2_vpc_subnet:

state: present

vpc_id: "{{elkVpc.vpc.id}}"

cidr: "{{cidr}}"

map_public: yes

tags:

Name: ELK public subnet

- name: Set up routing for the subnet

ec2_vpc_route_table:

vpc_id: "{{elkVpc.vpc.id}}"

region: "{{region}}"

tags:

Name: ELK route table

subnets:

- "{{ elkSubnet.subnet.id }}"

routes:

- dest: 0.0.0.0/0

gateway_id: "{{ elkIgw.gateway_id }}"

- name: Set up Security Group for ELK

ec2_group:

description: A security group to be used with ELK stack

vpc_id: "{{elkVpc.vpc.id}}"

region: "{{region}}"

aws_access_key: "{{aws_access_key}}" # from aws-secrets.yml

aws_secret_key: "{{aws_secret_key}}" # from aws-secrets.yml

tags:

app_group: ELK

rules:

- proto: tcp

ports:

- 22

- 5601

cidr_ip: "{{local_ip}}/32" # ports 22, and 5601 are allowed ONLY from this ansible host

- proto: all

group_name: ELK_Security_Group # allow all communication across the cluster hosts

rules_egress:

- proto: all

cidr_ip: 0.0.0.0/0

- name: Set up an ELK key

ec2_key: # The key-pair used for ssh access to these hosts

region: "{{region}}"

- name: Save the private key

copy:

content: "{{ elkKey.key.private_key }}"

dest: "~/.ssh/{{key_pair}}.pem" # save the private key

mode: 0600

when: elkKey.changed

- name: Provision a set of instances

ec2: # Iterate over the dictionary read from the "cluster.yml" file

aws_access_key: "{{aws_access_key}}"

aws_secret_key: "{{aws_secret_key}}"

key_name: "{{key_pair}}"

vpc_subnet_id: "{{elkSubnet.subnet.id}}"

group_id: "{{elkSg.group_id}}"

region: "{{region}}"

instance_type: "{{item.value.instance_type}}"

image: "{{image}}"

wait: true

exact_count: "{{item.value.exact_count}}"

instance_tags: "{{item.value.instance_tags}}"

count_tag: "{{item.value.count_tag}}"

with_dict: "{{awsHosts}}"

The playbook is self-explanatory for the most part if you are used to writing ansible playbooks. We need the SSH access key for software provisioning later so we also augment group_vars/all.yml file with:

... [ OTHER STUFF ] ...

key_pair: ELK_KEY_PAIR

ansible_ssh_private_key_file: ~/.ssh/{{key_pair}}.pem

The credentials for api access are best encrypted, here with ansible-vault. You will need to supply your own key & secret in the file aws-secrets.yml

---

aws_access_key: YOUR AWS_ACCESS_KEY
aws_secret_key: YOUR AWS_SECRET_KEY

---

aws_access_key: YOUR AWS_ACCESS_KEY

aws_secret_key: YOUR AWS_SECRET_KEY

and encrypt with ansible-vault encrypt aws-secrets.yml choosing a password that you need to view/use this information. Now you can run the playbook with:

#!/bin/bash
ansible-playbook -v -i 'localhost,' provisionHardware.yml --ask-vault-pass

1 2	#!/bin/bash ansible-playbook -v -i 'localhost,' provisionHardware.yml --ask-vault-pass

supply the vault password, and get the cluster of hosts provisioned. A few other things that may need some explanation in the playbook are:

Line #11: Make sure to give the IP address of your ansible host. This host will have SSH access to the cluster hosts & access to Kibana
Line #14: The credentials for api access are placed in the file aws-secrets.yml
Line #74: The ansible host is granted access to ports 22 (ssh) & 5601 (kibana) on all the hosts. There is no app at ‘5601’ on non-Kibana nodes of course, but perhaps there is no need to be picky about this 🙂
Line #91: The generated key-pair is copied to the same location as what is referenced in group_vars/all.yml as the provisionSoftware.yml playbook needs it.
Lines #95 – 109: The cluster.yml file is read into a dictionary and iterated over to provision all the hosts

Dynamic inventory

When working with a cloud provider such as AWS, the hosts will come and go over time. If an instance has been terminated, running the provisionHardware.yml playbook again will re-provision it, but the IP address can be different. So it is best to query AWS for the current inventory details, at the time software is being provisioned. There are scripts available that will readily do this for us. Here are a few steps.

Install the Boto module for your Python, say like: sudo pip install boto
Download ec2.py and ec2.ini . chmod uog+x ec2.py
Run the following shell script

#!/bin/bash

export AWS_ACCESS_KEY_ID="YOUR_AWS_ACCESS_KEY_ID"
export AWS_SECRET_ACCESS_KEY="YOUR_AWS_SECRET_ACCESS_KEY"
export EC2_INI_PATH="$PWD/ec2.ini"

$PWD/ec2.py

#!/bin/bash

export AWS_ACCESS_KEY_ID="YOUR_AWS_ACCESS_KEY_ID"

export AWS_SECRET_ACCESS_KEY="YOUR_AWS_SECRET_ACCESS_KEY"

export EC2_INI_PATH="$PWD/ec2.ini"

$PWD/ec2.py

The output will be a json file that ansible can readily work with as an inventory file.

3. Provision Software

With (a) the infrastructure in place, and (b) a way to get a list of the available hosts, and (c) the ansible roles already developed and tested – all we need to do now is to just run our playbook provisionSoftware.yml against this infrastructure. We have a bit of housekeeping to do however before that to account for the differences between Vagrant/Virtualbox hosts & Aws hosts. The main difference is the need for an extra ‘pre_tasks’ section that should be run before applying the roles.

- hosts: security_group_ELK_Security_Group  # a group of all hosts in the cluster
  name: Run a pre_task on all hosts to insyall python
  user: ubuntu
  gather_facts: False # Cannot gather facts until pre_tasks are done

  pre_tasks:

    - name: apt update on ubuntu
      raw: sudo apt-get -y update
      ignore_errors: true

    - name: install python2 on ubuntu
      raw: sudo apt-get -y install python-simplejson
      ignore_errors: true
      notify: Gather facts now  # can gather facts now via a handler

  handlers:
    - name: Gather facts now
      setup:

- hosts: tag_Name_esMaster  # group of all ES master nodes
  become: true
  roles:
    - { role: elastic.elasticsearch, cluster_http_port: 9201, cluster_transport_tcp_port: 9301}

- hosts: tag_Name_esData  # group of all ES data nodes
  become: true
  roles:
    - { role: elastic.elasticsearch, cluster_http_port: 9201, cluster_transport_tcp_port: 9301}

- hosts: tag_Name_kibana  # group of all kibana nodes
  become: true
  roles:
    - { role: ashokc.kibana, kibana_server_port: 5601, cluster_http_port: 9201 }

- hosts: tag_Name_logstash  # group of all logstash nodes
  become: true
  roles:
    - { role: ashokc.logstash, cluster_http_port: 9201, filebeat_2_logstash_port: 5044 }

- hosts: tag_Name_filebeat  # group of all (application) nodes with filebeat
  become: true
  roles:
    - {role: ashokc.filebeat, filebeat_2_logstash_port: 5044 }

- hosts: security_group_ELK_Security_Group # a group of all hosts in the cluster

user: ubuntu

gather_facts: False # Cannot gather facts until pre_tasks are done

pre_tasks:

- name: apt update on ubuntu

raw: sudo apt-get -y update

ignore_errors: true

- name: install python2 on ubuntu

raw: sudo apt-get -y install python-simplejson

ignore_errors: true

notify: Gather facts now # can gather facts now via a handler

handlers:

- name: Gather facts now

setup:

- hosts: tag_Name_esMaster # group of all ES master nodes

become: true

roles:

- { role: elastic.elasticsearch, cluster_http_port: 9201, cluster_transport_tcp_port: 9301}

- hosts: tag_Name_esData # group of all ES data nodes

become: true

roles:

- { role: elastic.elasticsearch, cluster_http_port: 9201, cluster_transport_tcp_port: 9301}

- hosts: tag_Name_kibana # group of all kibana nodes

become: true

roles:

- { role: ashokc.kibana, kibana_server_port: 5601, cluster_http_port: 9201 }

- hosts: tag_Name_logstash # group of all logstash nodes

become: true

roles:

- { role: ashokc.logstash, cluster_http_port: 9201, filebeat_2_logstash_port: 5044 }

- hosts: tag_Name_filebeat # group of all (application) nodes with filebeat

become: true

roles:

- {role: ashokc.filebeat, filebeat_2_logstash_port: 5044 }

For ansible modules to run on a target host, that host will need to have the right Python packages. With Vagrant we had chosen a box that was already blessed with those. But the chosen AWS ami may not have it. So, before any ansible modules can run on the AWS host to apply the roles, we need to do a ‘raw install’ of these Python packages on those hosts. This is done by adding ‘pre_tasks’ to provisionSoftware.yml in Lines #1 – #20. The ‘pre_tasks’ run prior any role being applied.
The way we get ‘groups’ of hosts is by using the instance_tags we specified in cluster.yml. The filenames in group_vars should be accordingly changed as well.

group_vars/
├── all.yml
├── tag_Name_esData.json
├── tag_Name_esMaster.json
├── tag_Name_filebeat.yml
├── tag_Name_kibana.yml
└── tag_Name_logstash.yml

group_vars/

├── all.yml

├── tag_Name_esData.json

├── tag_Name_esMaster.json

├── tag_Name_filebeat.yml

├── tag_Name_kibana.yml

└── tag_Name_logstash.yml

As for the contents of the above files, couple of changes are in order with respect to getting the IP address of the host. For example in the files ‘tag_Name_esData.json’ & ‘tag_Name_esMaster.json’:

_{“masterHosts_transport” : “{% for host in groups[‘es-master-nodes’] %} {{hostvars[host][‘ansible_’+public_iface][‘ipv4’][‘address’] }}:{{cluster_trans port_tcp_port}}{%endfor %}”,}

_{“masterHosts_transport” : “{% for host in groups[‘tag_Name_esMaster’] %} {{hostvars[host][‘ec2_private_ip_address’]}}:{{cluster_transport_tcp_port}}{% endfor %}”,}

_{“network.host”: [“{{ hostvars[inventory_hostname][‘ansible_’ + public_iface][‘ipv4’][‘address’] }}”,”_local_” ],}

_{“network.host”: [“{{ hostvars[inventory_hostname][‘ec2_private_ip_address’] }}”,”_local_” ],}

With similar changes made to the other yml files, we are finally ready to provision software to the cluster by running the following script

#!/bin/bash
ansible-playbook -u ubuntu -v -i ./getAwsInventory.sh --ask-vault-pass

1 2	#!/bin/bash ansible-playbook -u ubuntu -v -i ./getAwsInventory.sh --ask-vault-pass

4. Testing

With the ELK cluster up and running, we can generate some logs on the filebeat hosts and watch them flow into Kibana. For this we can simply do:

Find the IP address of a filebeat host and copy genLogs.pl to that host. Log into that host and run the Perl script. Replace “xxx.xxx.xxx.xxx” below with the actual IP.

scp -i ~/.ssh/ELK_KEY_PAIR.pem ./genLogs.pl ubuntu@xxx.xxx.xxx.xxx:/home/ubuntu/genLogs.pl

ssh -i ~/.ssh/ELK_KEY_PAIR.pem xxx.xxx.xxx.xxx -l ubuntu

./genLogs.pl

scp -i ~/.ssh/ELK_KEY_PAIR.pem ./genLogs.pl ubuntu@xxx.xxx.xxx.xxx:/home/ubuntu/genLogs.pl

ssh -i ~/.ssh/ELK_KEY_PAIR.pem xxx.xxx.xxx.xxx -l ubuntu

./genLogs.pl

Get the IP address of KIbana host and go to the following URL. Replace yyy.yyy.yyy.yyy below with the actual Kibana IP

http://yyy.yyy.yyy.yyy:5601

1	http://yyy.yyy.yyy.yyy:5601

5. Summary

Our objective was to set up an ELK cluster on AWS. We split that into 2 ansible playbooks –

one for provisioning hardware as per specs, and
the other for provisioning software via previously developed & tested roles

While there may be number of other ways to skin this cat, it looks like we have achieved our objective… Agree or disagree?

ELK Clusters on AWS with Ansible

1. The cluster

2. Provision Hardware

Dynamic inventory

3. Provision Software

4. Testing

5. Summary

Like this:

Related

Leave a Reply Cancel reply