Aggregating, searching and centralising logs with Elasticsearch, Logstash, Kibana and Lumberjack

Setting up an ELK stack with lumberjack (logstash-forwarder) for log processing on EC2/CentOS.

The logs at the company for which I work have grown to such a point where the normal unix tools such as grep, cat and tail are grossly inefficient. After researching various log transports and indexing tools, we decided to go for Elasticsearch, Logstash and Kibana (ELK).

The three tools of ELK are maintained by the same group of people and work well together as a result.

I used the helpful blog post Centralizing Logs with Lumberjack, Logstash, and Elasticsearch by Brian Altenhofel to get up and running, but that post is out of date. Here's how I got a stack up and running on an Amazon EC2 instance running CentOS.

Java

Both Elasticsearch and Logstash use the JVM, so you'll need to install Java if it isn't already installed on your system.

Use yum search to find a package similarly named to java-1.7.0-openjdk.x86_64 : OpenJDK Runtime Environment to choose a version of Java to install. At the time this was written, Java 7 was the latest version; java-1.7.0-openjdk.

sudo yum search java
sudo yum install java-1.7.0-openjdk

Elasticsearch

ES is a highly available, scalable RESTful search service. Built on top of Apache Lucene, it has support for clusters and full text search and works with JSON documents.

Get the latest RPM from the ELK downloads page. Get the DEB for Debian and Debian derivatives such as Ubuntu. One could download the gzipped tarball and setup an init.d config, but I'm not going into that, here.

cd /tmp
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.noarch.rpm
sha1sum elasticsearch-1.2.1.noarch.rpm
sudo yum install elasticsearch-1.2.1.noarch.rpm

Compare the output of sha1sum to the SHA1 hash on the download page and make sure that they match before installing Elasticsearch.

Debian users use dpkg -i elasticsearch-1.2.1.deb, instead of yum.

Altenhofel warns about the volume of requests which your ES server is likely to receive. This translates into big bills for services where you are charged per I/O op.

There's a handful of settings one may want to change in /etc/elasticsearch/elasticsearch.yml, such as listing your cluster machines if you can't use multicast and making sure that the data directory is correct.

The service may already be running. If not, start it.

sudo service elasticsearch start

Logstash

Logstash is used for collecting, parsing, manipulating and storing logs. We use it to turn our text logs into JSON for Elasticsearch and better indexing. We also use it to add, remove and manipulate the log contents.

We use Lumberjack for log transport (instead of logstash, which can also be a transport) for its reduced footprint and its ability to encrypt the logs for transport. To this end, one must generate an SSL certificate on the logstash server.

sudo openssl req -x509 -newkey rsa:4096 -keyout /etc/ssl/logstash.key -out /etc/ssl/logstash.pub -nodes -days 365

rsa:4096 is the length of the key in bits and -days 365 is the number of days for which the certificate is valid (365 in this instance). Omit -nodes if you want to protect the private key with a passphrase. The public key, /etc/ssl/logstash.pub is the key which will be put on all of our lumberjack machines.

You'll be asked for information to be put into the certificate. This can be automated with configuration files for when you turn this into a headless install procedure.

Get the latest RPM from the ELK downloads page. Get the DEB for Debian and Debian derivatives such as Ubuntu. One could download the flat JAR and setup an init.d config, as Altenhofel describes in his blog. The commands are different in RHEL/CentOS, eg. Debian's start-stop-daemon is similar to Red Hat's daemon.

wget https://download.elasticsearch.org/logstash/logstash/packages/centos/logstash-1.4.1-1_bd507eb.noarch.rpm
sha1sum logstash-1.4.1-1_bd507eb.noarch.rpm
sudo yum install logstash-1.4.1-1_bd507eb.noarch.rpm

If you require some community-contributed filters such as the grep filter, you will also need the logstash contrib download in addition to the core download.

wget http://download.elasticsearch.org/logstash/logstash/packages/centos/logstash-contrib-1.4.1-1_6e42745.noarch.rpm
sha1sum logstash-contrib-1.4.1-1_6e42745.noarch.rpm
sudo yum install logstash-contrib-1.4.1-1_6e42745.noarch.rpm

Compare the output of sha1sum for both downloads to the SHA1 hashes on the download page and make sure that they match before installing Logstash.

Create and edit /etc/logstash/conf.d/logstash.conf to setup the logstash server. Here is an example configuration:

input {
        lumberjack {
                port => 5000
                ssl_certificate => "/etc/ssl/logstash.pub"
                ssl_key => "/etc/ssl/logstash.key"
                ssl_key_passphrase => "YourSSLKeyPassphrase,IfYouMadeOne"
        }
}

filter {
        if [type] == "apache-access-log" {
                grok {
                        match => { "message" => "%{COMBINEDAPACHELOG}" }
                }
                date {
                        match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
                }
                geoip {
                        source => "clientip"
                }
                useragent {
                        source => "agent"
                }
        }
        if [type] == "apache-error-log" {
                drop { }
        }
        if [type] == "wowza-access-log" {
                grok {
                        patterns_dir => "/etc/logstash/patterns"
                        match => [ "message", "%{WOWZAACCESSLOG}" ]
                        add_field => [ "datetime", "%{date} %{time} %{tz}" ]
                }
                date {
                        match => [ "datetime", "yyyy-MM-dd HH:mm:ss Z" ]
                }
        }
}

output {
        elasticsearch_http {
                host => "localhost"
        }
}

I've created my own grok pattern for turning the text input of our Wowza access logs into JSON. There's info on how the grok filter works on the Logstash website. Alternatively, see my post on organising wowza logs with logstash and grok for this pattern.

In this example, apache-error-log types are dropped and not sent to the output.

One can start logstash, now. Keep in mind that it may take a few minutes for Logstash to start up. That's just how it is. To start logstash, one must edit /etc/sysconfig/logstash and change START=false to START=true and then start the service.

sudo service logstash start

Lumberjack — logstash-forwarder

Lumberjack has been renamed to logstash-forwarder.

Logstash-forwarder does not use the JVM. This is one of the improvements of logstash-forwarder to reduce its footprint as compared to Logstash. It also has its own transport mechanism to provide security, low latency and reliability.

Go

The installer for logstash-forwarder requires the binaries for the Go Programming Language be available.

Download the linux Go tarball from the Go download page and untar it.

wget http://go.googlecode.com/files/go1.2.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.2.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin

Logstash-forwarder

Logstash-forwarder is installed from its Git repository. You'll need to have git installed.

sudo yum install git
cd /usr/local/
sudo git clone https://github.com/elasticsearch/logstash-forwarder.git
cd logstash-forwarder/
sudo /usr/local/go/bin/go build

Starting the service as a daemon requires adding a file to /etc/init.d/.

#! /bin/sh
#!/bin/bash
#
# sshd          Start up the OpenSSH server daemon
#
# chkconfig: 2345 55 25
# description: Lumberjack (logstash-forwarder) ships system logs off to logstash with encryption.
#
# processname: lumberjack
# pidfile: /var/run/lumberjack.pid

### BEGIN INIT INFO
# Provides: lumberjack
# Required-Start: $network $named $local_fs
# Required-Stop: $network $named
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Start logstash-forwarder
# Description:       Lumberjack (logstash-forwarder) ships system logs off to logstash with encryption
### END INIT INFO

# source function library
. /etc/rc.d/init.d/functions

PROG_DIR='/usr/local/logstash-forwarder'
PROG="$PROG_DIR/logstash-forwarder"
NAME='lumberjack'
CONFIG="/etc/default/$NAME.json"
LOCKFILE="/var/lock/subsys/$NAME"
PIDFILE="/var/run/$NAME.pid"
OUTPUT_LOGFILE="/var/log/$NAME/output.log"

if [ ! -x $PROG ]
then
 echo "$NAME: $PROG does not exist. " && failure
 exit 5
fi

start() {
 status_quiet
 STATUS=$?
 if [ $STATUS -eq 0 ]
 then
  PID=$(cat "$PIDFILE")
  echo -n "$NAME is already running ($PID). " && failure
  echo
  return 1
 fi
 if [ ! -f $CONFIG ]
 then
  echo -n "Config file $CONFIG does not exist. " && failure
  exit 6
 fi
 echo -n "Starting $NAME: "
 OUTPUT_DIR=$(dirname $OUTPUT_LOGFILE)
 [ -d "$OUTPUT_DIR" ] || mkdir "$OUTPUT_DIR"
 nohup "$PROG" -config="$CONFIG" >"$OUTPUT_LOGFILE" 2>&1 &
 RETVAL=$?
 PID=$!
 if [ $RETVAL -eq 0 ]
 then
  COUNTER=1
  while :
  do
   sleep 1
   grep -q 'Connected to' "$OUTPUT_LOGFILE" && break
   if grep -q 'Failed unmarshalling json' "$OUTPUT_LOGFILE"
   then
    failure
    echo
    echo 'Bad config file.'
    echo "Check the log file $OUTPUT_LOGFILE"
    kill "$PID"
    return 1
   fi
   if [ $COUNTER -gt 29 ]
   then
    failure
    echo
    echo "Could not connect to logstash server after $COUNTER seconds"
    echo "Check the log file $OUTPUT_LOGFILE"
    kill "$PID"
    return 1
   else
    COUNTER=$((COUNTER + 1))
   fi
  done
  if touch "$LOCKFILE"
  then
   success
  else
   failure
  fi
  echo
  echo "$PID" > "$PIDFILE"
  return 0
 else
  failure
  return 1
 fi
}

stop() {
 status_quiet
 STATUS=$?
 if [ ! $STATUS -eq 0 ]
 then
  echo -n "$NAME is not running. " && warning
  echo
  return 2
 fi
 PID=$(cat "$PIDFILE")
 echo -n "Stopping $NAME ($PID): "
 kill "$PID"
 RETVAL=$?
 if [ $RETVAL -eq 0 ]
 then
  rm -f "$LOCKFILE"
  rm -f "$PIDFILE"
  success
  echo
  return 0
 else
  failure
  echo
  return 1
 fi
}

status() {
 if [ ! -s "$PIDFILE" ]
 then
  echo "$NAME is not running."
  return 1
 fi
 PID=$(cat "$PIDFILE")
 if ps -p "$PID" > /dev/null
 then
  echo "$NAME is running ($PID)."
  return 0
 else
  echo "PID file is present, but $NAME is not running."
  return 2
 fi
}

status_quiet() {
 status >/dev/null 2>&1
 return $?
}

case "$1" in
 start)
  start
  RETVAL=$?
  ;;
 stop)
  stop
  RETVAL=$?
  ;;
 restart)
  stop
  start
  ;;
 status)
  status
  ;;
 *)
  echo "Usage: $0 {start|stop|status|restart}"
  RETVAL=2
esac
exit $RETVAL

Saving this to /etc/init.d/lumberjack will give you a service named lumberjack.

There is a configuration file which needs to be created at /etc/default/lumberjack.json. It could be anywhere, but this is where it was specified in the init.d file.

{
        "network": {
                "servers": [
                        "127.0.0.1:5000"
                ],
                "ssl ca": "/etc/ssl/logstash.pub",
                "timeout": 10
        },
        "files": [
                {
                        "paths": [
                                "/var/log/httpd/access_log*"
                        ],
                        "fields": {
                                "type": "apache-access-log"
                        }
                }
        ]
}

Change 127.0.0.1 to the IP address of the logstash server.

The type field allows you to differentiate the sources in logstash and is used for the conditional statements in the logstash config file.

The service file(s) in init.d must be executable in order to be usable:

sudo chmod u+x /etc/init.d/lumberjack

Start the service with:

sudo service lumberjack start

Kibana

Kibana is a graphical front-end for Elasticsearch. It can show you your data in various formats.

Kibana is, as are the other ELK tools, available from the ELK downloads page. It is available as a tarball or ZIP file.

cd /tmp/
wget wget https://download.elasticsearch.org/kibana/kibana/kibana-3.1.0.tar.gz
tar -xzf kibana-3.1.0.tar.gz

Installation is as simple as changing /tmp/kibana-3.1.0/config.js and copying the whole directory to your webserver's html folder.

/** @scratch /configuration/config.js/1
 * == Configuration
 * config.js is where you will find the core Kibana configuration. This file contains parameter that
 * must be set before kibana is run for the first time.
 */
define(['settings'],
function (Settings) {
 

  /** @scratch /configuration/config.js/2
   * === Parameters
   */
  return new Settings({

    /** @scratch /configuration/config.js/5
     * ==== elasticsearch
     *
     * The URL to your elasticsearch server. You almost certainly don't
     * want +http://localhost:9200+ here. Even if Kibana and Elasticsearch are on
     * the same host. By default this will attempt to reach ES at the same host you have
     * kibana installed on. You probably want to set it to the FQDN of your
     * elasticsearch host
     */
    elasticsearch: "http://127.0.0.1:9200",

    /** @scratch /configuration/config.js/5
     * ==== default_route
     *
     * This is the default landing page when you don't specify a dashboard to load. You can specify
     * files, scripts or saved dashboards here. For example, if you had saved a dashboard called
     * `WebLogs' to elasticsearch you might use:
     *
     * +default_route: '/dashboard/elasticsearch/WebLogs',+
     */
    default_route     : '/dashboard/file/default.json',

    /** @scratch /configuration/config.js/5
     * ==== kibana-int
     *
     * The default ES index to use for storing Kibana specific object
     * such as stored dashboards
     */
    kibana_index: "kibana-int",

    /** @scratch /configuration/config.js/5
     * ==== panel_name
     *
     * An array of panel modules available. Panels will only be loaded when they are defined in the
     * dashboard, but this list is used in the "add panel" interface.
     */
    panel_names: [
      'histogram',
      'map',
      'pie',
      'table',
      'filtering',
      'timepicker',
      'text',
      'hits',
      'column',
      'trends',
      'bettermap',
      'query',
      'terms',
      'stats',
      'sparklines'
    ]
  });
});

Change the elasticsearch IP address to the IP address of the elasticsearch server, not localhost or 127.0.0.1.

Now copy the whole folder into your web root.

sudo cp -R kibana-3.1.0/ /var/www/html/
sudo mv /var/www/html/kibana-3.1.0/ /var/www/html/kibana
sudo chown -R apache:apache /var/www/html/kibana

Don't forget to change the permissions of this kibana folder so that Apache (or others) can read it.

Firewall

In order to access Kibana and to facilitate log transfer, one must open some ports in the firewall.

Port	Protocol	Reason
9300 – 9400	TCP	Elasticsearch node communication. Other ES nodes used this port to send and receive data. It will try to use 9300 unless it is busy.
9200 – 9300	TCP	Elasticsearch HTTP traffic. Kibana uses this port to communicate with ES. ES will try to use 9200 unless it is busy.
5000	TCP	We configured logstash to listen on this port for logs from lumberjack. Add any ports which you need.
80	TCP	HTTP to access Kibana

Sumptis Agitarem de Technicæ

2014-02-20