2014-06-11

Organising Wowza Logs with Logstash and Grok


Pre-processing Wowza Streaming Engine (or Media Server) logs with Grok patterns syntax for better parsing with logstash.

>>>Click here to jump to the pattern to Copy & Paste.<<<

In a previous blog post I described setting up Logstash with Elasticsearch, Kibana and logstash-forwarder (lumberjack). In my logstash config file, I referenced a grok pattern for parsing a Wowza access log. I will describe that pattern, here.

This was created for Wowza Media Server 3.6 and works with Wowza Streaming Engine 4.0.

Grok

Using the grok filter in Logstash allows for patterns to be captured from a logstash field. These captured patterns can then become fully-fledged fields. Instead of a field called message containing the whole Wowza log entry, it can be expanded so that each part of the message becomes a field.

A Wowza log entry looks like this:

2014-06-11\t06:53:34\tUTC\tcreate\tstream\tINFO\t200\t-\t-\t_defaultVHost_\tvod\t_definst_\t0.0\tstream.example.com\t1935\trtsp://stream.example.com:1935/vod/_definst_/mp4:FootballTV/450kbps/BPLClipOfTheWeekArsenalVsFulham_450kbps.mp4?session_token=xxxxx&username=steven\t219.92.86.40\trtsp\t-\tSonyC6603 Build/10.5.A.0.230 stagefright/1.2 (Linux;Android 4.4.2)\t1335194632\t0\t0\t3\t0\t0\t0\tamazons3/my-bucket/BPLClipOfTheWeekArsenalVsFulham_450kbps.mp4\tsession_token=xxxxx&username=steven\t-\t-\t-\t-\trtsp://stream.example.com:1935/vod/_definst_/mp4:FootballTV/450kbps/BPLClipOfTheWeekArsenalVsFulham_450kbps.mp4?session_token=xxxxx&username=steven\trtsp://stream.example.com:1935/vod/_definst_/mp4:FootballTV/450kbps/BPLClipOfTheWeekArsenalVsFulham_450kbps.mp4\tsession_token=xxxxx&username=steven\trtsp://stream.example.com:1935/vod/_definst_/mp4:FootballTV/450kbps/BPLClipOfTheWeekArsenalVsFulham_450kbps.mp4\tsession_token=xxxxx&username=steven

It's not very useful like this. I can't easily search through it. I certainly can't see information at a glance. Each of these tab-delimited fields means something. These "columns" have names which are specified in the user manual.

With grok, we can create new fields using the names in the Wowza manual and the values in the message field. Instead of a single message field, we can have:

date
2014-06-11
time
06:53:34

tz



UTC

xEvent



create

xCategory



stream

xSeverity



INFO

xStatus



200

xComment



-

xApp



vod

xAppinst



_definst_

sUri



rtsp://stream.example.com:1935/vod/_definst_/mp4:FootballTV/450kbps/BPLClipOfTheWeekArsenalVsFulham_450kbps.mp4?session_token=xxxxx&username=steven

...and so on. This is much better.

Basic Patterns

Creating grok patterns is described in the logstash filters documentation.

Date

The default grok date formats do not include the YYYY-MM-DD format which Wowza uses. We can use the default year, month number and day number.

YMDDATE %{YEAR}-%{MONTHNUM}-%{MONTHDAY}

This gives a syntax called YMDDATE.


Fields

The fields in the log are separated by tab characters by default. Anything after the date and time AND starts with a tab character is a new field.

WOWZENTRY [^\t]+

This becomes a problem when the xComment field can be written to in your own Wowza Java modules through WMSLogger.info() and similar methods. As that field can contain user data (things not written by Wowza), then it is not unreasonable to expect tab characters in that field.

By that reasoning, I added another, special capture after the xComment capture:

(?<xCommentMalformed>\t.*)?

This prevents tab characters in xComment from causing the grok pattern to break and start the middle of the xComment as other fields. It adds the offending part of the comment to a new field called xCommentMalformed. The field can have any unique name.


Specific fields

There are fields in the log which could be captured by the grok syntax defaults, such as IP, PATH and NONNEGINT. These cannot be used because Wowza will replace the values in these fields with a hyphen (-) if they were not populated by the message.

To address this issue, I created some specific implementations of these syntax labels.

WOWZIP (?:%{IP}|-)
WOWZNONNEGINT (?:%{NONNEGINT}|-)
WOWZPATH (?:%{PATH}|-)
WOWZNUMBER (?:%{NUMBER}|-)


The Full Pattern

The part you probably just wanted to copy and paste.

This creates a syntax named WOWZAACCESSLOG.

This pattern is for Wowza Streaming Engine 4.1.

YMDDATE %{YEAR}-%{MONTHNUM}-%{MONTHDAY}
WOWZENTRY [^\t]+
WOWZIP (?:%{IP}|-)
WOWZNONNEGINT (?:%{NONNEGINT}|-)
WOWZNUMBER (?:%{NUMBER}|-)
WOWZPATH (?:%{PATH}|-)
WOWZAACCESSLOG %{YMDDATE:date}\t%{TIME:time}\t%{TZ:tz}\t%{WOWZENTRY:xEvent}\t%{WORD:xCategory}\t%{LOGLEVEL:xSeverity}\t%{WOWZNONNEGINT:xStatus}\t%{WOWZENTRY:xCtx}\t%{WOWZENTRY:xComment}(?<xcommentmalformed>\t.*)?\t%{WOWZENTRY:xVhost}\t%{WOWZENTRY:xApp}\t%{WOWZENTRY:xAppinst}\t%{NUMBER:xDuration}\t%{WOWZENTRY:sIp}\t%{WOWZNONNEGINT:sPort}\t%{WOWZENTRY:sUri}\t%{WOWZIP:cIp}\t%{WOWZENTRY:cProto}\t%{WOWZENTRY:cReferrer}\t%{WOWZENTRY:cUserAgent}\t%{WOWZENTRY:cClientId}\t%{WOWZNONNEGINT:csBytes}\t%{WOWZNONNEGINT:scBytes}\t%{WOWZENTRY:xStreamId}\t%{WOWZNONNEGINT:xSpos}\t%{WOWZNONNEGINT:csStreamBytes}\t%{WOWZNONNEGINT:scStreamBytes}\t%{WOWZENTRY:xSname}\t%{WOWZENTRY:xSnameQuery}\t%{WOWZPATH:xFileName}\t%{WOWZENTRY:xFileExt}\t%{WOWZNONNEGINT:xFileSize}\t%{WOWZNUMBER:xFileLength}\t%{WOWZENTRY:xSuri}\t%{WOWZENTRY:xSuriStem}\t%{WOWZENTRY:xSuriQuery}\t%{WOWZENTRY:csUriStem}\t%{WOWZENTRY:csUriQuery}

This pattern is for older versions of Wowza Streaming Engine and Wowza Media Server. If a pattern gives you _grokparsefailure tags or other errors, try the other.

YMDDATE %{YEAR}-%{MONTHNUM}-%{MONTHDAY}
WOWZENTRY [^\t]+
WOWZIP (?:%{IP}|-)
WOWZNONNEGINT (?:%{NONNEGINT}|-)
WOWZPATH (?:%{PATH}|-)
WOWZAACCESSLOG %{YMDDATE:date}\t%{TIME:time}\t%{TZ:tz}\t%{WOWZENTRY:xEvent}\t%{WORD:xCategory}\t%{LOGLEVEL:xSeverity}\t%{WOWZNONNEGINT:xStatus}\t%{WOWZENTRY:xCtx}\t%{WOWZENTRY:xComment}(?<xCommentMalformed>\t.*)?\t%{WOWZENTRY:xVhost}\t%{WOWZENTRY:xApp}\t%{WOWZENTRY:xAppinst}\t%{NUMBER:xDuration}\t%{WOWZENTRY:sIp}\t%{WOWZNONNEGINT:sPort}\t%{WOWZENTRY:sUri}\t%{WOWZIP:cIp}\t%{WOWZENTRY:cProto}\t%{WOWZENTRY:cReferrer}\t%{WOWZENTRY:cUserAgent}\t%{WOWZENTRY:cClientId}\t%{WOWZNONNEGINT:csBytes}\t%{WOWZNONNEGINT:scBytes}\t%{WOWZENTRY:xStreamId}\t%{WOWZNONNEGINT:xSpos}\t%{WOWZNONNEGINT:csStreamBytes}\t%{WOWZNONNEGINT:scStreamBytes}\t%{WOWZENTRY:xSname}\t%{WOWZENTRY:xSnameQuery}\t%{WOWZPATH:xFileName}\t%{WOWZENTRY:xFileExt}\t%{WOWZNONNEGINT:xFileSize}\t%{WOWZNONNEGINT:xFileLength}\t%{WOWZENTRY:xSuri}\t%{WOWZENTRY:xSuriStem}\t%{WOWZENTRY:xSuriQuery}\t%{WOWZENTRY:csUriStem}\t%{WOWZENTRY:csUriQuery}

The difference is that x-file-length can be a decimal in newer logs, whereas my old pattern only looks for positive integers.

It's worth noting that these patterns work if your logging fields are in this order:

  1. date
  2. time
  3. tz
  4. x-event
  5. x-category
  6. x-severity
  7. x-statusx-ctx
  8. x-comment
  9. x-vhost
  10. x-app
  11. x-appinst
  12. x-duration
  13. s-ip
  14. s-port
  15. s-uri
  16. c-ip
  17. c-proto
  18. c-referrer
  19. c-user-agent
  20. c-client-id
  21. cs-bytes
  22. sc-bytes
  23. x-stream-id
  24. x-spos
  25. cs-stream-bytes
  26. sc-stream-bytes
  27. x-sname
  28. x-sname-query
  29. x-file-name
  30. x-file-ext
  31. x-file-size
  32. x-file-length
  33. x-suri
  34. x-suri-stem
  35. x-suri-query
  36. cs-uri-stem
  37. cs-uri-query

Using The Pattern In Logstash

I recommend reading a previous blog post for setting up logstash if you haven't already.

Save these patterns to a file. It can be any file as long as you remember the path and that logstash has permission to read it. Try /etc/logstash/patterns/wowza.

Edit your logstash configuration file /etc/logstash/conf.d/logstash.conf. Add a new grok filter:

filter {
    ... other filters ...
    grok {
        patterns_dir => "/etc/logstash/patterns"
        match => [ "message", "%{WOWZAACCESSLOG}" ]
    }
}

Telling logstash the time

Logstash will use the current time to stamp the log entry. In order to tell logstash to use the time in the log message instead, it needs to be told where to find it.

The required date fields are date, time and tz as defined in the WOWZACCESSLOG grok pattern.

Create a new field containing a timestamp (named datetime) using grok's add_field function and then use the date logstash filter to capture it:

grok {
    patterns_dir => "/etc/logstash/patterns"
    match => [ "message", "%{WOWZAACCESSLOG}" ]
    add_field => [ "datetime", "%{date} %{time} %{tz}" ]
}
date {
    match => [ "datetime", "yyyy-MM-dd HH:mm:ss Z" ]
}

Conclusion

This post has covered the creation of custom grok syntax to parse a Wowza access log entry for use with logstash. It will separate the log parts into individual fields. This should make your Wowza logs considerably more useful and search-able, particularly when used with Elasticsearch.

When used in conjuction with my post covering the setting up of Elasticsearch, Logstash, Kibana and logstash-forwarder, these posts should give enough information to allow you to organise your Wowza logs.

Happy Wowza debugging!

P.S. On the subject of debugging, check out my post for debugging your Wowza modules code by stepping through the Java code in NetBeans (or any JIT debugger) remotely.

1 comment: