{"version": "https://jsonfeed.org/version/1", "title": "/dev/posts/ - Tag index - reinforcement-learning", "home_page_url": "https://www.gabriel.urdhr.fr", "feed_url": "/tags/reinforcement-learning/feed.json", "items": [{"id": "http://www.gabriel.urdhr.fr/2025/09/22/reinforcement-learning-formulas/", "title": "Reinforcement Learning formulas cheat sheet", "url": "https://www.gabriel.urdhr.fr/2025/09/22/reinforcement-learning-formulas/", "date_published": "2025-09-22T00:00:00+02:00", "date_modified": "2025-09-22T00:00:00+02:00", "tags": ["computer", "machine-learning", "reinforcement-learning", "neural-networks"], "content_html": "<p>Cheat sheet for (some) reinforcement learning mathematical formulas and algorithms.</p>\n"}, {"id": "http://www.gabriel.urdhr.fr/2025/01/07/transformer-decoder-language-models/", "title": "Transformer-decoder language models", "url": "https://www.gabriel.urdhr.fr/2025/01/07/transformer-decoder-language-models/", "date_published": "2025-01-07T00:00:00+01:00", "date_modified": "2025-09-23T22:22:00+02:00", "tags": ["computer", "machine-learning", "deep-learning", "language-model", "neural-networks", "reinforcement-learning", "LLM"], "content_html": "<p>Some notes on how <a href=\"https://arxiv.org/abs/1801.10198\">transformer-decoder</a> language models work,\ntaking GPT-2 as an example,\nand with lots references in order to dig deeper.\nThis is intended both as a a roadmap for understanding on how LLMs work\n(especially the ones using a transformer-decoder architecture)\nand a a summary/recap on the topic.</p>\n"}]}