Setting up an ELK stack with lumberjack (logstash-forwarder) for log processing on EC2/CentOS.
The logs at the company for which I work have grown to such a point where the normal unix tools such as grep, cat and tail are grossly inefficient. After researching various log transports and indexing tools, we decided to go for Elasticsearch, Logstash and Kibana (ELK).
The three tools of ELK are maintained by the same group of people and work well together as a result.
I used the helpful blog post Centralizing Logs with Lumberjack, Logstash, and Elasticsearch by Brian Altenhofel to get up and running, but that post is out of date. Here's how I got a stack up and running on an Amazon EC2 instance running CentOS.
Java
Both Elasticsearch and Logstash use the JVM, so you'll need to install Java if it isn't already installed on your system.
Use yum search
to find a package similarly named to java-1.7.0-openjdk.x86_64 : OpenJDK Runtime Environment
to choose a version of Java to install. At the time this was written, Java 7 was the latest version; java-1.7.0-openjdk.
sudo yum search java sudo yum install java-1.7.0-openjdk
Elasticsearch
ES is a highly available, scalable RESTful search service. Built on top of Apache Lucene, it has support for clusters and full text search and works with JSON documents.
Get the latest RPM from the ELK downloads page. Get the DEB for Debian and Debian derivatives such as Ubuntu. One could download the gzipped tarball and setup an init.d config, but I'm not going into that, here.
cd /tmp wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.noarch.rpm sha1sum elasticsearch-1.2.1.noarch.rpm sudo yum install elasticsearch-1.2.1.noarch.rpm
Compare the output of sha1sum
to the SHA1 hash on the download page and make sure that they match before installing Elasticsearch.
Debian users use dpkg -i elasticsearch-1.2.1.deb
, instead of yum.
Altenhofel warns about the volume of requests which your ES server is likely to receive. This translates into big bills for services where you are charged per I/O op.
There's a handful of settings one may want to change in /etc/elasticsearch/elasticsearch.yml
, such as listing your cluster machines if you can't use multicast and making sure that the data directory is correct.
The service may already be running. If not, start it.
sudo service elasticsearch start
Logstash
Logstash is used for collecting, parsing, manipulating and storing logs. We use it to turn our text logs into JSON for Elasticsearch and better indexing. We also use it to add, remove and manipulate the log contents.
We use Lumberjack for log transport (instead of logstash, which can also be a transport) for its reduced footprint and its ability to encrypt the logs for transport. To this end, one must generate an SSL certificate on the logstash server.
sudo openssl req -x509 -newkey rsa:4096 -keyout /etc/ssl/logstash.key -out /etc/ssl/logstash.pub -nodes -days 365
rsa:4096
is the length of the key in bits and -days 365
is the number of days for which the certificate is valid (365 in this instance). Omit -nodes
if you want to protect the private key with a passphrase. The public key, /etc/ssl/logstash.pub
is the key which will be put on all of our lumberjack machines.
You'll be asked for information to be put into the certificate. This can be automated with configuration files for when you turn this into a headless install procedure.
Get the latest RPM from the ELK downloads page. Get the DEB for Debian and Debian derivatives such as Ubuntu. One could download the flat JAR and setup an init.d config, as Altenhofel describes in his blog. The commands are different in RHEL/CentOS, eg. Debian's start-stop-daemon
is similar to Red Hat's daemon
.
wget https://download.elasticsearch.org/logstash/logstash/packages/centos/logstash-1.4.1-1_bd507eb.noarch.rpm sha1sum logstash-1.4.1-1_bd507eb.noarch.rpm sudo yum install logstash-1.4.1-1_bd507eb.noarch.rpm
If you require some community-contributed filters such as the grep filter, you will also need the logstash contrib download in addition to the core download.
wget http://download.elasticsearch.org/logstash/logstash/packages/centos/logstash-contrib-1.4.1-1_6e42745.noarch.rpm sha1sum logstash-contrib-1.4.1-1_6e42745.noarch.rpm sudo yum install logstash-contrib-1.4.1-1_6e42745.noarch.rpm
Compare the output of sha1sum
for both downloads to the SHA1 hashes on the download page and make sure that they match before installing Logstash.
Create and edit /etc/logstash/conf.d/logstash.conf
to setup the logstash server. Here is an example configuration:
input { lumberjack { port => 5000 ssl_certificate => "/etc/ssl/logstash.pub" ssl_key => "/etc/ssl/logstash.key" ssl_key_passphrase => "YourSSLKeyPassphrase,IfYouMadeOne" } } filter { if [type] == "apache-access-log" { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } date { match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ] } geoip { source => "clientip" } useragent { source => "agent" } } if [type] == "apache-error-log" { drop { } } if [type] == "wowza-access-log" { grok { patterns_dir => "/etc/logstash/patterns" match => [ "message", "%{WOWZAACCESSLOG}" ] add_field => [ "datetime", "%{date} %{time} %{tz}" ] } date { match => [ "datetime", "yyyy-MM-dd HH:mm:ss Z" ] } } } output { elasticsearch_http { host => "localhost" } }
I've created my own grok pattern for turning the text input of our Wowza access logs into JSON. There's info on how the grok filter works on the Logstash website. Alternatively, see my post on organising wowza logs with logstash and grok for this pattern.
In this example, apache-error-log types are dropped and not sent to the output.
One can start logstash, now. Keep in mind that it may take a few minutes for Logstash to start up. That's just how it is. To start logstash, one must edit /etc/sysconfig/logstash
and change START=false
to START=true
and then start the service.
sudo service logstash start
Lumberjack — logstash-forwarder
Lumberjack has been renamed to logstash-forwarder.
Logstash-forwarder does not use the JVM. This is one of the improvements of logstash-forwarder to reduce its footprint as compared to Logstash. It also has its own transport mechanism to provide security, low latency and reliability.
Go
The installer for logstash-forwarder requires the binaries for the Go Programming Language be available.
Download the linux Go tarball from the Go download page and untar it.
wget http://go.googlecode.com/files/go1.2.linux-amd64.tar.gz sudo tar -C /usr/local -xzf go1.2.linux-amd64.tar.gz export PATH=$PATH:/usr/local/go/bin
Logstash-forwarder
Logstash-forwarder is installed from its Git repository. You'll need to have git installed.
sudo yum install git cd /usr/local/ sudo git clone https://github.com/elasticsearch/logstash-forwarder.git cd logstash-forwarder/ sudo /usr/local/go/bin/go build
Starting the service as a daemon requires adding a file to /etc/init.d/
.
#! /bin/sh #!/bin/bash # # sshd Start up the OpenSSH server daemon # # chkconfig: 2345 55 25 # description: Lumberjack (logstash-forwarder) ships system logs off to logstash with encryption. # # processname: lumberjack # pidfile: /var/run/lumberjack.pid ### BEGIN INIT INFO # Provides: lumberjack # Required-Start: $network $named $local_fs # Required-Stop: $network $named # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Start logstash-forwarder # Description: Lumberjack (logstash-forwarder) ships system logs off to logstash with encryption ### END INIT INFO # source function library . /etc/rc.d/init.d/functions PROG_DIR='/usr/local/logstash-forwarder' PROG="$PROG_DIR/logstash-forwarder" NAME='lumberjack' CONFIG="/etc/default/$NAME.json" LOCKFILE="/var/lock/subsys/$NAME" PIDFILE="/var/run/$NAME.pid" OUTPUT_LOGFILE="/var/log/$NAME/output.log" if [ ! -x $PROG ] then echo "$NAME: $PROG does not exist. " && failure exit 5 fi start() { status_quiet STATUS=$? if [ $STATUS -eq 0 ] then PID=$(cat "$PIDFILE") echo -n "$NAME is already running ($PID). " && failure echo return 1 fi if [ ! -f $CONFIG ] then echo -n "Config file $CONFIG does not exist. " && failure exit 6 fi echo -n "Starting $NAME: " OUTPUT_DIR=$(dirname $OUTPUT_LOGFILE) [ -d "$OUTPUT_DIR" ] || mkdir "$OUTPUT_DIR" nohup "$PROG" -config="$CONFIG" >"$OUTPUT_LOGFILE" 2>&1 & RETVAL=$? PID=$! if [ $RETVAL -eq 0 ] then COUNTER=1 while : do sleep 1 grep -q 'Connected to' "$OUTPUT_LOGFILE" && break if grep -q 'Failed unmarshalling json' "$OUTPUT_LOGFILE" then failure echo echo 'Bad config file.' echo "Check the log file $OUTPUT_LOGFILE" kill "$PID" return 1 fi if [ $COUNTER -gt 29 ] then failure echo echo "Could not connect to logstash server after $COUNTER seconds" echo "Check the log file $OUTPUT_LOGFILE" kill "$PID" return 1 else COUNTER=$((COUNTER + 1)) fi done if touch "$LOCKFILE" then success else failure fi echo echo "$PID" > "$PIDFILE" return 0 else failure return 1 fi } stop() { status_quiet STATUS=$? if [ ! $STATUS -eq 0 ] then echo -n "$NAME is not running. " && warning echo return 2 fi PID=$(cat "$PIDFILE") echo -n "Stopping $NAME ($PID): " kill "$PID" RETVAL=$? if [ $RETVAL -eq 0 ] then rm -f "$LOCKFILE" rm -f "$PIDFILE" success echo return 0 else failure echo return 1 fi } status() { if [ ! -s "$PIDFILE" ] then echo "$NAME is not running." return 1 fi PID=$(cat "$PIDFILE") if ps -p "$PID" > /dev/null then echo "$NAME is running ($PID)." return 0 else echo "PID file is present, but $NAME is not running." return 2 fi } status_quiet() { status >/dev/null 2>&1 return $? } case "$1" in start) start RETVAL=$? ;; stop) stop RETVAL=$? ;; restart) stop start ;; status) status ;; *) echo "Usage: $0 {start|stop|status|restart}" RETVAL=2 esac exit $RETVAL
Saving this to /etc/init.d/lumberjack
will give you a service named lumberjack.
There is a configuration file which needs to be created at /etc/default/lumberjack.json
. It could be anywhere, but this is where it was specified in the init.d file.
{ "network": { "servers": [ "127.0.0.1:5000" ], "ssl ca": "/etc/ssl/logstash.pub", "timeout": 10 }, "files": [ { "paths": [ "/var/log/httpd/access_log*" ], "fields": { "type": "apache-access-log" } } ] }
Change 127.0.0.1
to the IP address of the logstash server.
The type
field allows you to differentiate the sources in logstash and is used for the conditional statements in the logstash config file.
The service file(s) in init.d must be executable in order to be usable:
sudo chmod u+x /etc/init.d/lumberjack
Start the service with:
sudo service lumberjack start
Kibana
Kibana is a graphical front-end for Elasticsearch. It can show you your data in various formats.
Kibana is, as are the other ELK tools, available from the ELK downloads page. It is available as a tarball or ZIP file.
cd /tmp/ wget wget https://download.elasticsearch.org/kibana/kibana/kibana-3.1.0.tar.gz tar -xzf kibana-3.1.0.tar.gz
Installation is as simple as changing /tmp/kibana-3.1.0/config.js
and copying the whole directory to your webserver's html folder.
/** @scratch /configuration/config.js/1 * == Configuration * config.js is where you will find the core Kibana configuration. This file contains parameter that * must be set before kibana is run for the first time. */ define(['settings'], function (Settings) { /** @scratch /configuration/config.js/2 * === Parameters */ return new Settings({ /** @scratch /configuration/config.js/5 * ==== elasticsearch * * The URL to your elasticsearch server. You almost certainly don't * want +http://localhost:9200+ here. Even if Kibana and Elasticsearch are on * the same host. By default this will attempt to reach ES at the same host you have * kibana installed on. You probably want to set it to the FQDN of your * elasticsearch host */ elasticsearch: "http://127.0.0.1:9200", /** @scratch /configuration/config.js/5 * ==== default_route * * This is the default landing page when you don't specify a dashboard to load. You can specify * files, scripts or saved dashboards here. For example, if you had saved a dashboard called * `WebLogs' to elasticsearch you might use: * * +default_route: '/dashboard/elasticsearch/WebLogs',+ */ default_route : '/dashboard/file/default.json', /** @scratch /configuration/config.js/5 * ==== kibana-int * * The default ES index to use for storing Kibana specific object * such as stored dashboards */ kibana_index: "kibana-int", /** @scratch /configuration/config.js/5 * ==== panel_name * * An array of panel modules available. Panels will only be loaded when they are defined in the * dashboard, but this list is used in the "add panel" interface. */ panel_names: [ 'histogram', 'map', 'pie', 'table', 'filtering', 'timepicker', 'text', 'hits', 'column', 'trends', 'bettermap', 'query', 'terms', 'stats', 'sparklines' ] }); });
Change the elasticsearch
IP address to the IP address of the elasticsearch server, not localhost or 127.0.0.1.
Now copy the whole folder into your web root.
sudo cp -R kibana-3.1.0/ /var/www/html/ sudo mv /var/www/html/kibana-3.1.0/ /var/www/html/kibana sudo chown -R apache:apache /var/www/html/kibana
Don't forget to change the permissions of this kibana folder so that Apache (or others) can read it.
Firewall
In order to access Kibana and to facilitate log transfer, one must open some ports in the firewall.
Port | Protocol | Reason |
---|---|---|
9300 – 9400 | TCP | Elasticsearch node communication. Other ES nodes used this port to send and receive data. It will try to use 9300 unless it is busy. |
9200 – 9300 | TCP | Elasticsearch HTTP traffic. Kibana uses this port to communicate with ES. ES will try to use 9200 unless it is busy. |
5000 | TCP | We configured logstash to listen on this port for logs from lumberjack. Add any ports which you need. |
80 | TCP | HTTP to access Kibana |
Excellent blog with unique content, thanks a lot for sharing this. I love to learn more about this topic.
ReplyDeleteData Science Training in Chennai
Data Science Certification in Chennai
R Programming Training in Chennai
R analytics Training in Chennai
Machine Learning Training in Chennai
Machine Learning course
Data Science Training in Velachery
Data Science Training in Anna Nagar
Thanks for your efforts in sharing this information in detail. This was very helpful to me. kindly keep continuing the great work.
ReplyDeleteSpoken English Classes in Chennai
Best Spoken English Classes in Chennai
IELTS Coaching in Chennai
IELTS Coaching Centre in Chennai
English Speaking Classes in Mumbai
English Speaking Course in Mumbai
IELTS Classes in Mumbai
IELTS Coaching in Mumbai
Good blog!!! It is more impressive... thanks for sharing with us... keep it up guys.
ReplyDeleteAi & Artificial Intelligence Course in Chennai
PHP Training in Chennai
Ethical Hacking Course in Chennai Blue Prism Training in Chennai
UiPath Training in Chennai
ReplyDeleteFantastic, please keep posting such a helpful post. how much is Kenya visa on arrival ?The Kenya visa expense exclusively relies on the visa type you have chosen, not upon your identity. It doesn't make any difference to which country you have a place, as Kenya has set similar expenses for the residents of every qualified country.
üsküdar beko klima servisi
ReplyDeletependik lg klima servisi
pendik daikin klima servisi
tuzla toshiba klima servisi
tuzla beko klima servisi
çekmeköy alarko carrier klima servisi
ataşehir alarko carrier klima servisi
çekmeköy daikin klima servisi
ataşehir daikin klima servisi
Adobe Creative Cloud Crack 2022 all apps download for Mac and Windows.like “Adobe Creative Cloud free download full version crack” and don't share them.Adobe inventive Cloud 5.7.1.1 Crack
ReplyDeleteRadarOpus homeopathic software has an insightful analysis section that helps users to quickly sort out the symptoms as per their requirements.How RadarOpus is the future of homeopathic software?
ReplyDelete