nginx, Logstash and vhost-combined log format
Published:
Updated:
The Apache HTTP server ships with a split-logfile
utility which parses Combined Log File entries prefixed with the virtual host: some notes about this and its inclusion in nginx and logstash.
Apache
This is the format expected by split-logfile
:
www.gabriel.urdhr.fr ::1 - - [08/Jan/2015:23:51:34 +0100] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.3.0"
It can be configured in Apache with:
LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined_vhost
# For reference those are the definitions for the standard log formats:
LogFormat "%h %l %u %t \"%r\" %>s %b" common
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined
The split-logfile
reads this and generates separate log files for each virtual-host:
/usr/sbin/split-logfile < access.log
Parsing with logstash or grok
Logstash (or any grok-based software) can be taught to process this in patterns/grok-patterns
with:
COMBINED_VHOST %{HOSTNAME:vhost} %{COMBINEDAPACHELOG}
which extends the predefined formats:
COMMONAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
COMBINEDAPACHELOG %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}
Used in a configuration file such as:
input {
file {
path => ['/var/log/nginx/access.log']
start_position => beginning
}
}
filter {
mutate {
replace => {
"type" => "access"
}
}
grok {
match => {
"message" => "%{COMBINED_VHOST}"
}
}
}
output {
stdout {
codec => rubydebug
}
}
nginx
nginx can be configured to generate a similar type of log with:
log_format combined_vhost '$server_name $remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';
# For reference:
log_format common '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent ';
# This one is predefined:
log_format combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';
Logging the requested virtual host
Those configurations log the configured virtual host, not the requested virtual host (the content of the Host
HTTP header). If you want to log the content of the Host
HTTP header, you can use:
\"%{Host}i\"
in Apache;\"$host\"
in nginx.
As the header can contain a space, they should be quoted. split-logfile
won't work well and the logstash/grok pattern will have to be adapted.