/dev/posts/

nginx, Logstash and vhost-combined log format

Published:

Updated:

The Apache HTTP server ships with a split-logfile utility which parses Combined Log File entries prefixed with the virtual host: some notes about this and its inclusion in nginx and logstash.

Apache

This is the format expected by split-logfile:

www.gabriel.urdhr.fr ::1 - - [08/Jan/2015:23:51:34 +0100] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.3.0"

It can be configured in Apache with:

LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined_vhost

# For reference those are the definitions for the standard log formats:
LogFormat "%h %l %u %t \"%r\" %>s %b" common
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined

The split-logfile reads this and generates separate log files for each virtual-host:

/usr/sbin/split-logfile < access.log

Parsing with logstash or grok

Logstash (or any grok-based software) can be taught to process this in patterns/grok-patterns with:

COMBINED_VHOST %{HOSTNAME:vhost} %{COMBINEDAPACHELOG}

which extends the predefined formats:

COMMONAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
COMBINEDAPACHELOG %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}

Used in a configuration file such as:

input {
  file {
    path => ['/var/log/nginx/access.log']
    start_position => beginning
  }
}

filter {
  mutate {
    replace => {
      "type" => "access"
    }
  }
  grok {
    match => {
      "message" => "%{COMBINED_VHOST}"
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

nginx

nginx can be configured to generate a similar type of log with:

log_format combined_vhost '$server_name $remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

# For reference:
log_format common   '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent ';
# This one is predefined:
log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

Logging the requested virtual host

Those configurations log the configured virtual host, not the requested virtual host (the content of the Host HTTP header). If you want to log the content of the Host HTTP header, you can use:

As the header can contain a space, they should be quoted. split-logfile won't work well and the logstash/grok pattern will have to be adapted.

Appendix: other web logging stuff