{"version": "https://jsonfeed.org/version/1", "title": "/dev/posts/ - Tag index - journald", "home_page_url": "https://www.gabriel.urdhr.fr", "feed_url": "/tags/journald/feed.json", "items": [{"id": "http://www.gabriel.urdhr.fr/2023/09/17/process-json-log/", "title": "Analysing structured log files with simple tools", "url": "https://www.gabriel.urdhr.fr/2023/09/17/process-json-log/", "date_published": "2023-09-17T00:00:00+02:00", "date_modified": "2023-09-17T00:00:00+02:00", "tags": ["computer", "log", "journald"], "content_html": "
Some tools and other notes\nwhen you just want to analyze your structured\nlog files locally using simple tools\nwith a focus for newline-delimited JSON (NDJSON) /\nJSON lines /\nJSON Text Sequences.
\nMany line-oriented tools work OK if your log files are NDJSON:
\ngrep
, can be used filtering (but tools such jq
might more convenient and more useful for JSON logs);head -n $count
;tail -n $count
;tail -f
and (usually better) tail -F
for following files;less
;less -r
(preserve colors);shuf -n $count
, take random samples.jq
is very useful for processing JSON data:
-C
) JSON entries by default;jq 'select(.level_value > 30000)'
), transformations, etc.;jq '. * {ts: .timestamp | fromdateiso8601}'
);--seq
).You can convert JSON log files with JSON filters such as\n(using ANSI escape codes for colors):
\n(if .level == \"ERROR\" then \"\\u001b[1;31m\" elif .level == \"WARN\" then \"\\u001b[1;33m\" else \"\" end)\n+ .timestamp\n+ \" \"\n+ .level\n+ \" \"\n+ \"[\"\n+ .thread\n+ \"] \"\n+ .message\n+ ( if .level == \"ERROR\" or .level == \"WARN\" then \"\\u001b[1;0m\" else \"\" end)\n+ \"\\n\"\n+ (if .exception then .exception else \"\" end)\n
\nWhich can be used with something like:
\ntail -F app.json | jq -j -f log.jq | less -r\n
\nVisiData is a Text User Interface (TUI)\nexplorer for tabular data.\nIt can take a lot of different formats as input\nand works quite well with NDJSON logs.\nIt can be used for filtering data,\nsorting data,\ngenerating descriptive statistics and pivot tables,\ntyping columns,\nsummarizing data,\ncreating pivot tables,\ngenerate simple plot (text-oriented),\netc.
\npandas is a Python DataFrame library:
\nWe can even plot directly on the terminal\nusing some additional plugins,\nusing either the iTerm image protocol,\nthe kitty image protocol\nor the Sixel image protocol\n(it supported by the terminal).
\nHighlights:
\npandas.read_json()
for pargins JSON;data.set_index(\"timestamp\")
;data.sample()
for taking random samples;data.tz_convert(\"Europe/Paris\")
for changing the timezone;data[\"field\"].resample(\"1H\").mean()
, aggregation;data[data[\"field\"] >= 100]
for filtering;data.plot()
;data.to_json()
;data.head(10)
;data.tail(10)
.Example: taking random samples
\nimport pandas as pd\ndata = pd.read_json('logs.json', lines=True).set_index(\"timestamp\")\ndata[[\"path\"]].sample(30)\n
\n path\ntimestamp \n2023-02-10 18:38:16+00:00 /fonts/SourceCodePro-Semibold.woff\n2023-02-10 10:38:44+00:00 /img/oauth2-bearer-token.svg\n2023-02-10 20:31:45+00:00 /fonts/SourceCodePro-Regular.woff\n2023-02-10 11:18:33+00:00 /fonts/SourceSerifPro-Bold.woff\n2023-02-10 03:56:44+00:00 /feed.atom\n2023-02-10 13:58:45+00:00 /css/main.css\n2023-02-10 04:56:04+00:00 /fonts/SourceSerifPro-Semibold.woff\n2023-02-10 09:16:45+00:00 /feed.atom\n2023-02-10 11:48:36+00:00 /2021/06/02/dns-rebinding-explained/\n2023-02-10 10:37:17+00:00 /2014/05/23/flamegraph/\n2023-02-10 20:12:15+00:00 /2021/05/08/tuntap/\n2023-02-10 13:29:16+00:00 /page/3/\n2023-02-09 23:10:55+00:00 /css/main.css\n2023-02-10 14:28:13+00:00 /tags/perf/\n2023-02-10 08:44:46+00:00 /2015/05/29/core-file/\n2023-02-10 13:36:00+00:00 /2016/10/18/terminal-sharing/\n2023-02-10 14:15:26+00:00 /2019/feed.json\n2023-02-10 15:28:01+00:00 /js/main.js\n2023-02-10 20:31:03+00:00 /img/android-emulator-proxy.png\n2023-02-10 14:45:01+00:00 /2022/06/07/impact-of-the-different-wifi-secur...\n2023-02-10 13:42:43+00:00 /tags/wifi/\n2023-02-10 13:42:51+00:00 /tags/dns-rebinding/page/2/\n2023-02-10 10:31:32+00:00 /img/wpa2-eap.svg\n2023-02-10 01:50:34+00:00 /css/main.css\n2023-02-10 15:28:06+00:00 /fonts/SourceCodePro-Regular.woff\n2023-02-10 22:11:38+00:00 /2021/05/08/tuntap/\n2023-02-10 06:27:46+00:00 /fonts/SourceSerifPro-Regular.woff\n2023-02-10 08:00:55+00:00 /css/main.css\n2023-02-10 08:44:45+00:00 /js/main.js\n2023-02-10 14:29:41+00:00 /tags/perf/feed.json\n
\nExample: aggregation
\ndata.resample(\"1H\").size()\n
\ntimestamp\n2023-02-09 23:00:00+00:00 90\n2023-02-10 00:00:00+00:00 84\n2023-02-10 01:00:00+00:00 70\n2023-02-10 02:00:00+00:00 117\n2023-02-10 03:00:00+00:00 97\n2023-02-10 04:00:00+00:00 108\n2023-02-10 05:00:00+00:00 89\n2023-02-10 06:00:00+00:00 52\n2023-02-10 07:00:00+00:00 93\n2023-02-10 08:00:00+00:00 211\n2023-02-10 09:00:00+00:00 85\n2023-02-10 10:00:00+00:00 137\n2023-02-10 11:00:00+00:00 92\n2023-02-10 12:00:00+00:00 49\n2023-02-10 13:00:00+00:00 373\n2023-02-10 14:00:00+00:00 335\n2023-02-10 15:00:00+00:00 107\n2023-02-10 16:00:00+00:00 105\n2023-02-10 17:00:00+00:00 91\n2023-02-10 18:00:00+00:00 60\n2023-02-10 19:00:00+00:00 67\n2023-02-10 20:00:00+00:00 104\n2023-02-10 21:00:00+00:00 98\n2023-02-10 22:00:00+00:00 76\nFreq: H, dtype: int64\n
\nExample: plotting
\nimport matplotlib.pyplot as plt\n\ndata.resample(\"1H\").size().plot()\nplt.show()\n
\n\npolars is another (Python and Rust) DataFrame library\nwhich may be useful in order to achieve greater performance.\nThe syntax is quite more verbose and probably not as useful\nfor quick analyses.\nPandas is probably more useful for one-shot ad hoc analysis.
\nHighlights:
\npolars.read_ndjson(\"logs.json\")
;data.group_by_dynamic(\"timestamp\", every=\"1h\", period=\"1h\").agg(...)
;data.sample(10)
, data.head10)
, data.tail(10)
, etc.Example: aggregation
\nimport polars as pl\n\npl.Config.set_tbl_rows(-1)\n\ndata = pl.read_ndjson(\"logs.json\")\ndata = data.with_columns(pl.col(\"timestamp\").str.to_datetime())\nz = data.sort(\"timestamp\").group_by_dynamic(\"timestamp\", every=\"1h\", period=\"1h\", closed=\"left\").agg(pl.count())\nz\n
\nshape: (24, 2)\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 timestamp \u2506 count \u2502\n\u2502 --- \u2506 --- \u2502\n\u2502 datetime[\u03bcs, UTC] \u2506 u32 \u2502\n\u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561\n\u2502 2023-02-09 23:00:00 UTC \u2506 90 \u2502\n\u2502 2023-02-10 00:00:00 UTC \u2506 84 \u2502\n\u2502 2023-02-10 01:00:00 UTC \u2506 70 \u2502\n\u2502 2023-02-10 02:00:00 UTC \u2506 117 \u2502\n\u2502 2023-02-10 03:00:00 UTC \u2506 97 \u2502\n\u2502 2023-02-10 04:00:00 UTC \u2506 108 \u2502\n\u2502 2023-02-10 05:00:00 UTC \u2506 89 \u2502\n\u2502 2023-02-10 06:00:00 UTC \u2506 52 \u2502\n\u2502 2023-02-10 07:00:00 UTC \u2506 93 \u2502\n\u2502 2023-02-10 08:00:00 UTC \u2506 211 \u2502\n\u2502 2023-02-10 09:00:00 UTC \u2506 85 \u2502\n\u2502 2023-02-10 10:00:00 UTC \u2506 137 \u2502\n\u2502 2023-02-10 11:00:00 UTC \u2506 92 \u2502\n\u2502 2023-02-10 12:00:00 UTC \u2506 49 \u2502\n\u2502 2023-02-10 13:00:00 UTC \u2506 373 \u2502\n\u2502 2023-02-10 14:00:00 UTC \u2506 335 \u2502\n\u2502 2023-02-10 15:00:00 UTC \u2506 107 \u2502\n\u2502 2023-02-10 16:00:00 UTC \u2506 105 \u2502\n\u2502 2023-02-10 17:00:00 UTC \u2506 91 \u2502\n\u2502 2023-02-10 18:00:00 UTC \u2506 60 \u2502\n\u2502 2023-02-10 19:00:00 UTC \u2506 67 \u2502\n\u2502 2023-02-10 20:00:00 UTC \u2506 104 \u2502\n\u2502 2023-02-10 21:00:00 UTC \u2506 98 \u2502\n\u2502 2023-02-10 22:00:00 UTC \u2506 76 \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n
\nExample: plotting
\nimport matplotlib.pyplot as plt\nimport matplotlib.dates as mdates\n\nfig = plt.figure()\nax = plt.plot(z[\"timestamp\"], z[\"count\"])[0].axes\nax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d %H:%M'))\nfig.autofmt_xdate()\nplt.show()\n
\n\nDuckDB in an in-memory database for data analysis (OLAP)\nwith a SQL interface.\nIt can use a lot of different formats and works quit well with NDJSON logs.\nSee for example Shredding Deeply Nested JSON, One Vector at a Time.
\nPRQL (Pipelined Relational Query Language)\ncan be used with DuckDB with a extension.\nIt might be more convenient than SQL for complex analysis.\nIt did not try this.
\nHarlequin is a TUI IDE for DuckDB\n(based on the awesome Textual library).
\nExample: raking random samples
\nCREATE TABLE logs AS SELECT * FROM read_ndjson_auto(\"logs.json\");\nselect timestamp, path from logs USING SAMPLE 25;\n
\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 timestamp \u2502 path \u2502\n\u2502 timestamp \u2502 varchar \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 2023-02-10 20:57:47 \u2502 /feed.atom \u2502\n\u2502 2023-02-10 00:12:45 \u2502 /js/main.js \u2502\n\u2502 2023-02-10 09:40:19 \u2502 /fonts/SourceCodePro-Regular.woff \u2502\n\u2502 2023-02-10 03:40:17 \u2502 /2022/03/15/dns-rebinding-readymedia/ \u2502\n\u2502 2023-02-10 10:31:22 \u2502 /fonts/SourceSerifPro-Semibold.woff \u2502\n\u2502 2023-02-10 04:23:31 \u2502 /img/tls-1.3-summary.svg \u2502\n\u2502 2023-02-10 01:13:35 \u2502 / \u2502\n\u2502 2023-02-10 13:58:45 \u2502 /2022/08/28/trying-to-run-stable-diffusion-on-amd-ryzen-5-5600g/ \u2502\n\u2502 2023-02-10 22:30:30 \u2502 /fonts/SourceSerifPro-Semibold.woff \u2502\n\u2502 2023-02-09 23:30:25 \u2502 /feed.atom \u2502\n\u2502 2023-02-10 18:38:16 \u2502 /fonts/SourceCodePro-Regular.woff \u2502\n\u2502 2023-02-10 13:10:43 \u2502 /fonts/SourceSerifPro-Bold.woff \u2502\n\u2502 2023-02-10 07:30:24 \u2502 /feed.atom \u2502\n\u2502 2023-02-10 20:12:16 \u2502 /favicon.ico \u2502\n\u2502 2023-02-10 03:32:25 \u2502 /feed.atom \u2502\n\u2502 2023-02-10 04:23:20 \u2502 /2022/02/07/selenium-standalone-server-csrf-dns-rebinding-rce/ \u2502\n\u2502 2023-02-10 22:27:45 \u2502 /tags/hack/feed.atom \u2502\n\u2502 2023-02-10 14:36:58 \u2502 /fonts/SourceSerifPro-Regular.woff \u2502\n\u2502 2023-02-10 13:40:00 \u2502 /tags/dns-rebinding/ \u2502\n\u2502 2023-02-10 05:49:12 \u2502 /js/main.js \u2502\n\u2502 2023-02-10 10:16:02 \u2502 /js/main.js \u2502\n\u2502 2023-02-10 19:46:34 \u2502 /js/main.js \u2502\n\u2502 2023-02-10 14:17:20 \u2502 /tags/wifi/feed.atom \u2502\n\u2502 2023-02-10 16:23:03 \u2502 /fonts/SourceSerifPro-Semibold.woff \u2502\n\u2502 2023-02-10 13:28:16 \u2502 /2015/11/25/rr-use-after-free/ \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 25 rows 2 columns \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n
\nExample: aggregation
\nselect time_bucket(INTERVAL 1 HOUR, timestamp), count(*) from logs group by all;\n
\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 time_bucket(to_hours(CAST(1 AS BIGINT)), \"timestamp\") \u2502 count_star() \u2502\n\u2502 timestamp \u2502 int64 \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 2023-02-09 23:00:00 \u2502 90 \u2502\n\u2502 2023-02-10 00:00:00 \u2502 84 \u2502\n\u2502 2023-02-10 01:00:00 \u2502 70 \u2502\n\u2502 2023-02-10 02:00:00 \u2502 117 \u2502\n\u2502 2023-02-10 03:00:00 \u2502 97 \u2502\n\u2502 2023-02-10 04:00:00 \u2502 108 \u2502\n\u2502 2023-02-10 05:00:00 \u2502 89 \u2502\n\u2502 2023-02-10 06:00:00 \u2502 52 \u2502\n\u2502 2023-02-10 07:00:00 \u2502 93 \u2502\n\u2502 2023-02-10 08:00:00 \u2502 211 \u2502\n\u2502 2023-02-10 09:00:00 \u2502 85 \u2502\n\u2502 2023-02-10 10:00:00 \u2502 137 \u2502\n\u2502 2023-02-10 11:00:00 \u2502 92 \u2502\n\u2502 2023-02-10 12:00:00 \u2502 49 \u2502\n\u2502 2023-02-10 13:00:00 \u2502 373 \u2502\n\u2502 2023-02-10 14:00:00 \u2502 335 \u2502\n\u2502 2023-02-10 15:00:00 \u2502 107 \u2502\n\u2502 2023-02-10 16:00:00 \u2502 105 \u2502\n\u2502 2023-02-10 17:00:00 \u2502 91 \u2502\n\u2502 2023-02-10 18:00:00 \u2502 60 \u2502\n\u2502 2023-02-10 19:00:00 \u2502 67 \u2502\n\u2502 2023-02-10 20:00:00 \u2502 104 \u2502\n\u2502 2023-02-10 21:00:00 \u2502 98 \u2502\n\u2502 2023-02-10 22:00:00 \u2502 76 \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 24 rows 2 columns \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n
\nlnav provides navigation for many log file formats.\nIt features:
\njl provides pretty printing of several JSON log formats.
\njournalctl
\ncan export journald\nlogs to NDJSON form with journalctl -o json
.\nSee systemd.journal-fields\nfor a description of typical fields in journald logs.
We can convert\nCombined (access) Log Format\nto NDJSON with a script such as:
\n#!/usr/bin/python3\n\nimport click\nfrom typing import List, Optional, Any, Callable\nimport re\nimport sys\nfrom io import TextIOBase\nfrom re import Pattern\nfrom json import dumps\nfrom datetime import datetime, date\nfrom ipaddress import ip_address, IPv4Address, IPv6Address\nfrom dateutil.tz import gettz\nfrom datetime import timezone\n\n\ndef json_mapper(obj):\n if isinstance(obj, (datetime, date)):\n res = obj.isoformat()\n if obj.tzinfo is timezone.utc:\n res = res.replace(\"+00:00\", \"Z\")\n return res\n if isinstance(obj, (IPv4Address, IPv6Address)):\n return str(obj)\n raise TypeError(\"No serialization for %s\" % type(obj))\n\n\ndef parse_int(value: str) -> Optional[int]:\n if value == \"-\":\n return None\n return int(value)\n\n\ntz = timezone.utc\n\n\ndef parse_date(value: str) -> datetime:\n return datetime.strptime(value, \"%d/%b/%Y:%H:%M:%S %z\").astimezone(tz)\n\n\nclass Field:\n name: str\n regex: str\n converter: Callable[[str], Any] | None\n\n def __init__(\n self, name: str, regex: str, converter: Callable[[str], Any] | None = None\n ) -> None:\n self.name = name\n self.regex = regex\n self.converter = converter\n\n\nclass Format:\n _tokens: List[Field | str]\n _fields: List[Field]\n _pattern = Pattern\n\n def __init__(self, tokens: List[Field | str]) -> None:\n self._tokens = tokens\n\n regex_string = (\n \"^\"\n + \"\".join(\n (\"(\" + token.regex + \")\") if isinstance(token, Field) else token\n for token in tokens\n )\n + \"$\"\n )\n self._pattern = re.compile(regex_string)\n\n self._fields = [token for token in tokens if isinstance(token, Field)]\n\n def parse(self, data: str) -> Optional[dict]:\n match = self._pattern.match(data)\n if match is None:\n return None\n groups = match.groups()\n return {\n field.name: field.converter(groups[i])\n if field.converter is not None\n else groups[i]\n for i, field in enumerate(self._fields)\n }\n\n\nCOMBINED_FORMAT = Format(\n [\n Field(\"remote_addr\", \"[^ ]+\", converter=ip_address),\n \" [^ ]+ \",\n Field(\"remote_user\", \"[^ ]+\"),\n \" \\[\",\n Field(\"timestamp\", \"[^]]+\", converter=parse_date),\n '\\] \"',\n Field(\"method\", \"[^ ]+\"),\n \" \",\n Field(\"path\", '[^ \"]+'),\n \" \",\n Field(\"protocol\", '[^ \"]+'),\n '\" ',\n Field(\"status\", \"[-0-9]+\", converter=parse_int),\n \" \",\n Field(\"body_bytes_sent\", \"[-0-9]+\", converter=parse_int),\n ' \"',\n Field(\"http_referer\", '[^\"]+'),\n '\" \"',\n Field(\"http_user_agent\", '[^\"]+'),\n '\"',\n ]\n)\n\n\n@click.command()\n@click.argument(\"input\", type=click.File(\"rt\"))\n@click.option(\"--debug\", type=bool)\ndef main(input, debug):\n parse = COMBINED_FORMAT.parse\n for line in input:\n line = line.rstrip(\"\\n\")\n data = parse(line)\n if data is None:\n if debug:\n sys.stderr.write(line + \"\\n\")\n continue\n print(dumps(data, default=json_mapper))\n\n\nif __name__ == \"__main__\":\n main()\n
\nWhich generates something like:
\n{\"remote_addr\": \"a.b.c.d\", \"remote_user\": \"-\", \"timestamp\": \"2023-02-09T23:00:29Z\", \"method\": \"GET\", \"path\": \"/feed.atom\", \"protocol\": \"HTTP/2.0\", \"status\": 304, \"body_bytes_sent\": 0, \"http_referer\": \"-\", \"http_user_agent\": \"Tiny Tiny RSS/21.06 (Unsupported) (http://tt-rss.org/)\"}\n{\"remote_addr\": \"a.b.c.d\", \"remote_user\": \"-\", \"timestamp\": \"2023-02-09T23:02:31Z\", \"method\": \"GET\", \"path\": \"/img/wpa2-psk.svg\", \"protocol\": \"HTTP/2.0\", \"status\": 200, \"body_bytes_sent\": 14068, \"http_referer\": \"https://www.google.es/\", \"http_user_agent\": \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.2 Safari/605.1.15\"}\n
\njq
A short summary of the logging message workflow with\nsystemd-journald\n(and the different formats and sockets involved).
\n /run/systemd/journal/dev-log\n a.k.a. /dev/log (syslog format)\n .--------------->------------------.\n | |\n | |\n | /run/systemd/journal/socket v\n.-----------.(native format) .----------. .--------------------.\n| processes |--------->-----------------| journald |->| /var/log/journal/* |\n'-----------' '----------' '--------------------'\n | /run/systemd/journal/stdout ^ ^ |^ |\n | (stream format) | | || |/run/systemd/journal/syslog\n '--------------->-----------------' | || |(syslog format)\n | || |\n .--------. /dev/kmsg (kmsg format) | || | .---------. .------------.\n | kernel |-------------------------------' || '->| rsyslog |->| /var/log/* |\n '--------' || '---------' '------------'\n (Journal Export || | ^\n Format) v| v |(syslog format)\n .-----------------. .----------------------.\n | Remote journald | | Remote syslog daemon |\n '-----------------' '----------------------'\n
\nUPDATE 2020-07-30: since systemd 216,\nthe /run/systemd/journal/syslog
connection does not exist anymore.\nThe syslog daemon is expected to pull the data from journald instead.
/dev/kmsg
is a device used to receive logging messages from the\nLinux\nkernel\nusing a specific (kmsg) format.
/run/systemd/journal/dev-log
is a SOCK_DGRAM
socket which can be\nused by processes to send syslog-compatible messages to journald. It\nis symlinked in /dev/log
which means that all processes trying to\nuse the system syslog will in fact send their messages to journald.
It is used:
\nopenlog()
function;loggger
command./run/systemd/journal/stdout
is a SOCK_STREAM
socket which can be used by\nprocesses to send logging messages in the simple \u201cstream\u201d format (one\nline per message with an optional severity prefix).
Usage:
\nStandardOutput
and\nStandardError
\noptions);sd_journal_stream_fd()
\nfunction,systemd-cat
\ncommand.Just after opening the file, sd_journal_stream_fd()
sends the\nfollowing information:
/run/systemd/journal/syslog
(0 or 1).\nThis happens as well if ForwardToSyslog=true
in the systemd\nconfiguration (enabled by default)./dev/kmsg
(0 or 1).\nThis happens as well if ForwardToKMsg=true
in the systemd\nconfiguration (disabled by default). with the MaxLevelKMsg
option.ForwardToConsole=true
in the systemd\nconfiguration (disabled by default).Each line after this prolog is a logging message optionally prefixed\nwith a severity.
\nExample:
\nfoo\nfoo.service\n5\n1\n0\n0\n0\n<7>Debug 1\n<7>Debug 2\n
\nWe can create log entries with:
\necho \"foo\nfoo.service\n5\n1\n0\n0\n0\n<7>Debug 1\n<7>Debug 2\" | socat STDIN UNIX-CONNECT:/run/systemd/journal/stdout\n
\n/run/systemd/journal/socket
is a SOCK_DGRAM
socket used for\nlogging data to syslog using the native format. AFAIK, this is the\nsame format as Journal Export\nFormat but\nwith one logging message per datagram:
PRIORITY=7\nMESSAGE=First (debug) message\nSYSLOG_IDENTIFIER=foo\n
\nWe can create a message with:
\necho \"PRIORITY=7\nMESSAGE=Debug 1\nSYSLOG_IDENTIFIER=foo\nFOO=bar\n\" | socat STDIN UNIX-SENDTO:/run/systemd/journal/socket\n
\nIt is used by the C\nAPI:\nsd_journal_print
, sd_journal_send
, etc.
If the /var/log/journal
directory exists and has the proper\npermissions, journald will use it to store the logging information in\na binary\nformat.
/run/systemd/journal/syslog
is a SOCK_DGRAM
socket used to send\nthe messages to a syslog daemon (rsyslog
).