Setting up an ELK stack with lumberjack (logstash-forwarder) for log processing on EC2/CentOS.
The logs at the company for which I work have grown to such a point where the normal unix tools such as grep, cat and tail are grossly inefficient. After researching various log transports and indexing tools, we decided to go for Elasticsearch, Logstash and Kibana (ELK).
The three tools of ELK are maintained by the same group of people and work well together as a result.
I used the helpful blog post Centralizing Logs with Lumberjack, Logstash, and Elasticsearch by Brian Altenhofel to get up and running, but that post is out of date. Here's how I got a stack up and running on an Amazon EC2 instance running CentOS.
Java
Both Elasticsearch and Logstash use the JVM, so you'll need to install Java if it isn't already installed on your system.
Use yum search
to find a package similarly named to java-1.7.0-openjdk.x86_64 : OpenJDK Runtime Environment
to choose a version of Java to install. At the time this was written, Java 7 was the latest version; java-1.7.0-openjdk.
sudo yum search java sudo yum install java-1.7.0-openjdk
Elasticsearch
ES is a highly available, scalable RESTful search service. Built on top of Apache Lucene, it has support for clusters and full text search and works with JSON documents.
Get the latest RPM from the ELK downloads page. Get the DEB for Debian and Debian derivatives such as Ubuntu. One could download the gzipped tarball and setup an init.d config, but I'm not going into that, here.
cd /tmp wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.noarch.rpm sha1sum elasticsearch-1.2.1.noarch.rpm sudo yum install elasticsearch-1.2.1.noarch.rpm
Compare the output of sha1sum
to the SHA1 hash on the download page and make sure that they match before installing Elasticsearch.
Debian users use dpkg -i elasticsearch-1.2.1.deb
, instead of yum.
Altenhofel warns about the volume of requests which your ES server is likely to receive. This translates into big bills for services where you are charged per I/O op.
There's a handful of settings one may want to change in /etc/elasticsearch/elasticsearch.yml
, such as listing your cluster machines if you can't use multicast and making sure that the data directory is correct.
The service may already be running. If not, start it.
sudo service elasticsearch start
Logstash
Logstash is used for collecting, parsing, manipulating and storing logs. We use it to turn our text logs into JSON for Elasticsearch and better indexing. We also use it to add, remove and manipulate the log contents.
We use Lumberjack for log transport (instead of logstash, which can also be a transport) for its reduced footprint and its ability to encrypt the logs for transport. To this end, one must generate an SSL certificate on the logstash server.
sudo openssl req -x509 -newkey rsa:4096 -keyout /etc/ssl/logstash.key -out /etc/ssl/logstash.pub -nodes -days 365
rsa:4096
is the length of the key in bits and -days 365
is the number of days for which the certificate is valid (365 in this instance). Omit -nodes
if you want to protect the private key with a passphrase. The public key, /etc/ssl/logstash.pub
is the key which will be put on all of our lumberjack machines.
You'll be asked for information to be put into the certificate. This can be automated with configuration files for when you turn this into a headless install procedure.
Get the latest RPM from the ELK downloads page. Get the DEB for Debian and Debian derivatives such as Ubuntu. One could download the flat JAR and setup an init.d config, as Altenhofel describes in his blog. The commands are different in RHEL/CentOS, eg. Debian's start-stop-daemon
is similar to Red Hat's daemon
.
wget https://download.elasticsearch.org/logstash/logstash/packages/centos/logstash-1.4.1-1_bd507eb.noarch.rpm sha1sum logstash-1.4.1-1_bd507eb.noarch.rpm sudo yum install logstash-1.4.1-1_bd507eb.noarch.rpm
If you require some community-contributed filters such as the grep filter, you will also need the logstash contrib download in addition to the core download.
wget http://download.elasticsearch.org/logstash/logstash/packages/centos/logstash-contrib-1.4.1-1_6e42745.noarch.rpm sha1sum logstash-contrib-1.4.1-1_6e42745.noarch.rpm sudo yum install logstash-contrib-1.4.1-1_6e42745.noarch.rpm
Compare the output of sha1sum
for both downloads to the SHA1 hashes on the download page and make sure that they match before installing Logstash.
Create and edit /etc/logstash/conf.d/logstash.conf
to setup the logstash server. Here is an example configuration:
input { lumberjack { port => 5000 ssl_certificate => "/etc/ssl/logstash.pub" ssl_key => "/etc/ssl/logstash.key" ssl_key_passphrase => "YourSSLKeyPassphrase,IfYouMadeOne" } } filter { if [type] == "apache-access-log" { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } date { match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ] } geoip { source => "clientip" } useragent { source => "agent" } } if [type] == "apache-error-log" { drop { } } if [type] == "wowza-access-log" { grok { patterns_dir => "/etc/logstash/patterns" match => [ "message", "%{WOWZAACCESSLOG}" ] add_field => [ "datetime", "%{date} %{time} %{tz}" ] } date { match => [ "datetime", "yyyy-MM-dd HH:mm:ss Z" ] } } } output { elasticsearch_http { host => "localhost" } }
I've created my own grok pattern for turning the text input of our Wowza access logs into JSON. There's info on how the grok filter works on the Logstash website. Alternatively, see my post on organising wowza logs with logstash and grok for this pattern.
In this example, apache-error-log types are dropped and not sent to the output.
One can start logstash, now. Keep in mind that it may take a few minutes for Logstash to start up. That's just how it is. To start logstash, one must edit /etc/sysconfig/logstash
and change START=false
to START=true
and then start the service.
sudo service logstash start
Lumberjack — logstash-forwarder
Lumberjack has been renamed to logstash-forwarder.
Logstash-forwarder does not use the JVM. This is one of the improvements of logstash-forwarder to reduce its footprint as compared to Logstash. It also has its own transport mechanism to provide security, low latency and reliability.
Go
The installer for logstash-forwarder requires the binaries for the Go Programming Language be available.
Download the linux Go tarball from the Go download page and untar it.
wget http://go.googlecode.com/files/go1.2.linux-amd64.tar.gz sudo tar -C /usr/local -xzf go1.2.linux-amd64.tar.gz export PATH=$PATH:/usr/local/go/bin
Logstash-forwarder
Logstash-forwarder is installed from its Git repository. You'll need to have git installed.
sudo yum install git cd /usr/local/ sudo git clone https://github.com/elasticsearch/logstash-forwarder.git cd logstash-forwarder/ sudo /usr/local/go/bin/go build
Starting the service as a daemon requires adding a file to /etc/init.d/
.
#! /bin/sh #!/bin/bash # # sshd Start up the OpenSSH server daemon # # chkconfig: 2345 55 25 # description: Lumberjack (logstash-forwarder) ships system logs off to logstash with encryption. # # processname: lumberjack # pidfile: /var/run/lumberjack.pid ### BEGIN INIT INFO # Provides: lumberjack # Required-Start: $network $named $local_fs # Required-Stop: $network $named # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Start logstash-forwarder # Description: Lumberjack (logstash-forwarder) ships system logs off to logstash with encryption ### END INIT INFO # source function library . /etc/rc.d/init.d/functions PROG_DIR='/usr/local/logstash-forwarder' PROG="$PROG_DIR/logstash-forwarder" NAME='lumberjack' CONFIG="/etc/default/$NAME.json" LOCKFILE="/var/lock/subsys/$NAME" PIDFILE="/var/run/$NAME.pid" OUTPUT_LOGFILE="/var/log/$NAME/output.log" if [ ! -x $PROG ] then echo "$NAME: $PROG does not exist. " && failure exit 5 fi start() { status_quiet STATUS=$? if [ $STATUS -eq 0 ] then PID=$(cat "$PIDFILE") echo -n "$NAME is already running ($PID). " && failure echo return 1 fi if [ ! -f $CONFIG ] then echo -n "Config file $CONFIG does not exist. " && failure exit 6 fi echo -n "Starting $NAME: " OUTPUT_DIR=$(dirname $OUTPUT_LOGFILE) [ -d "$OUTPUT_DIR" ] || mkdir "$OUTPUT_DIR" nohup "$PROG" -config="$CONFIG" >"$OUTPUT_LOGFILE" 2>&1 & RETVAL=$? PID=$! if [ $RETVAL -eq 0 ] then COUNTER=1 while : do sleep 1 grep -q 'Connected to' "$OUTPUT_LOGFILE" && break if grep -q 'Failed unmarshalling json' "$OUTPUT_LOGFILE" then failure echo echo 'Bad config file.' echo "Check the log file $OUTPUT_LOGFILE" kill "$PID" return 1 fi if [ $COUNTER -gt 29 ] then failure echo echo "Could not connect to logstash server after $COUNTER seconds" echo "Check the log file $OUTPUT_LOGFILE" kill "$PID" return 1 else COUNTER=$((COUNTER + 1)) fi done if touch "$LOCKFILE" then success else failure fi echo echo "$PID" > "$PIDFILE" return 0 else failure return 1 fi } stop() { status_quiet STATUS=$? if [ ! $STATUS -eq 0 ] then echo -n "$NAME is not running. " && warning echo return 2 fi PID=$(cat "$PIDFILE") echo -n "Stopping $NAME ($PID): " kill "$PID" RETVAL=$? if [ $RETVAL -eq 0 ] then rm -f "$LOCKFILE" rm -f "$PIDFILE" success echo return 0 else failure echo return 1 fi } status() { if [ ! -s "$PIDFILE" ] then echo "$NAME is not running." return 1 fi PID=$(cat "$PIDFILE") if ps -p "$PID" > /dev/null then echo "$NAME is running ($PID)." return 0 else echo "PID file is present, but $NAME is not running." return 2 fi } status_quiet() { status >/dev/null 2>&1 return $? } case "$1" in start) start RETVAL=$? ;; stop) stop RETVAL=$? ;; restart) stop start ;; status) status ;; *) echo "Usage: $0 {start|stop|status|restart}" RETVAL=2 esac exit $RETVAL
Saving this to /etc/init.d/lumberjack
will give you a service named lumberjack.
There is a configuration file which needs to be created at /etc/default/lumberjack.json
. It could be anywhere, but this is where it was specified in the init.d file.
{ "network": { "servers": [ "127.0.0.1:5000" ], "ssl ca": "/etc/ssl/logstash.pub", "timeout": 10 }, "files": [ { "paths": [ "/var/log/httpd/access_log*" ], "fields": { "type": "apache-access-log" } } ] }
Change 127.0.0.1
to the IP address of the logstash server.
The type
field allows you to differentiate the sources in logstash and is used for the conditional statements in the logstash config file.
The service file(s) in init.d must be executable in order to be usable:
sudo chmod u+x /etc/init.d/lumberjack
Start the service with:
sudo service lumberjack start
Kibana
Kibana is a graphical front-end for Elasticsearch. It can show you your data in various formats.
Kibana is, as are the other ELK tools, available from the ELK downloads page. It is available as a tarball or ZIP file.
cd /tmp/ wget wget https://download.elasticsearch.org/kibana/kibana/kibana-3.1.0.tar.gz tar -xzf kibana-3.1.0.tar.gz
Installation is as simple as changing /tmp/kibana-3.1.0/config.js
and copying the whole directory to your webserver's html folder.
/** @scratch /configuration/config.js/1 * == Configuration * config.js is where you will find the core Kibana configuration. This file contains parameter that * must be set before kibana is run for the first time. */ define(['settings'], function (Settings) { /** @scratch /configuration/config.js/2 * === Parameters */ return new Settings({ /** @scratch /configuration/config.js/5 * ==== elasticsearch * * The URL to your elasticsearch server. You almost certainly don't * want +http://localhost:9200+ here. Even if Kibana and Elasticsearch are on * the same host. By default this will attempt to reach ES at the same host you have * kibana installed on. You probably want to set it to the FQDN of your * elasticsearch host */ elasticsearch: "http://127.0.0.1:9200", /** @scratch /configuration/config.js/5 * ==== default_route * * This is the default landing page when you don't specify a dashboard to load. You can specify * files, scripts or saved dashboards here. For example, if you had saved a dashboard called * `WebLogs' to elasticsearch you might use: * * +default_route: '/dashboard/elasticsearch/WebLogs',+ */ default_route : '/dashboard/file/default.json', /** @scratch /configuration/config.js/5 * ==== kibana-int * * The default ES index to use for storing Kibana specific object * such as stored dashboards */ kibana_index: "kibana-int", /** @scratch /configuration/config.js/5 * ==== panel_name * * An array of panel modules available. Panels will only be loaded when they are defined in the * dashboard, but this list is used in the "add panel" interface. */ panel_names: [ 'histogram', 'map', 'pie', 'table', 'filtering', 'timepicker', 'text', 'hits', 'column', 'trends', 'bettermap', 'query', 'terms', 'stats', 'sparklines' ] }); });
Change the elasticsearch
IP address to the IP address of the elasticsearch server, not localhost or 127.0.0.1.
Now copy the whole folder into your web root.
sudo cp -R kibana-3.1.0/ /var/www/html/ sudo mv /var/www/html/kibana-3.1.0/ /var/www/html/kibana sudo chown -R apache:apache /var/www/html/kibana
Don't forget to change the permissions of this kibana folder so that Apache (or others) can read it.
Firewall
In order to access Kibana and to facilitate log transfer, one must open some ports in the firewall.
Port | Protocol | Reason |
---|---|---|
9300 – 9400 | TCP | Elasticsearch node communication. Other ES nodes used this port to send and receive data. It will try to use 9300 unless it is busy. |
9200 – 9300 | TCP | Elasticsearch HTTP traffic. Kibana uses this port to communicate with ES. ES will try to use 9200 unless it is busy. |
5000 | TCP | We configured logstash to listen on this port for logs from lumberjack. Add any ports which you need. |
80 | TCP | HTTP to access Kibana |
I read like this type of article only. You mentioned more important points in this article. Thankyou for interesting article.
ReplyDeleteC C++ Training in Chennai
c c++ courses in chennai
C C++ Training in T Nagar
JMeter Training in Chennai
JMeter Course
Appium Training in Chennai
javascript training in chennai
core java training in chennai
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important.Project Center in Chennai
DeleteSpring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Corporate TRaining Spring Framework the authors explore the idea of using Java in Big Data platforms.
Spring Training in Chennai
The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training
Excellent blog with unique content, thanks a lot for sharing this. I love to learn more about this topic.
ReplyDeleteData Science Training in Chennai
Data Science Certification in Chennai
R Programming Training in Chennai
R analytics Training in Chennai
Machine Learning Training in Chennai
Machine Learning course
Data Science Training in Velachery
Data Science Training in Anna Nagar
Thanks for your efforts in sharing this information in detail. This was very helpful to me. kindly keep continuing the great work.
ReplyDeleteSpoken English Classes in Chennai
Best Spoken English Classes in Chennai
IELTS Coaching in Chennai
IELTS Coaching Centre in Chennai
English Speaking Classes in Mumbai
English Speaking Course in Mumbai
IELTS Classes in Mumbai
IELTS Coaching in Mumbai
The article is so informative. This is more helpful for our
ReplyDeletebest software testing training in chennai
best software testing training institute in chennai with placement
software testing training
courses
software testing training and placement
software testing training online
software testing class
software testing classes in chennai
best software testing courses in chennai
automation testing courses in chennai
Thanks for sharing.
Thanks for sharing an informative blog keep rocking bring more details.I like the helpful info you provide in your articles. I’ll bookmark your weblog and check again here regularly. I am quite sure I will learn much new stuff right here! Good luck for the next!
ReplyDeleteWeb Designing Training Institute in Chennai | web design training class in chennai | web designing course in chennai with placement
Mobile Application Development Courses in chennai
Data Science Training in Chennai | Data Science courses in Chennai
Professional packers and movers in chennai | PDY Packers | Household Goods Shifting
Web Designing Training Institute in Chennai | Web Designing courses in Chennai
Google ads services | Google Ads Management agency
Web Designing Course in Chennai | Web Designing Training in Chennai
I have been reading for the past two days about your blogs and topics, still on fetching! Wondering about your words on each line was massively effective.
ReplyDeletephp online training in chennai
php programming center in chennai
php class in chennnai
php certification course
php developer training institution chennai
php training in chennnai
php mysql course in chennai
php institute in chennnai
php course in chennnai
php training with placement in chennnai
php developer course
Nice post. Thanks for sharing! I want people to know just how good this information is in your article. It’s interesting content and Great work.
ReplyDeleteappium online training
appium training centres in chennai
best appium training institute in chennnai
apppium course
mobile appium in chennnai
mobile training in chennnai
appium training institute in chennnai
Good blog!!! It is more impressive... thanks for sharing with us... keep it up guys.
ReplyDeleteAi & Artificial Intelligence Course in Chennai
PHP Training in Chennai
Ethical Hacking Course in Chennai Blue Prism Training in Chennai
UiPath Training in Chennai