/dev/posts/

llama.cpp quickstart

Published:

How to quickly use llama.cpp for LLM inference (no GPU needed).

Read more…

vLLM quickstart

Published:

How to quickly use vLLM for LLM inference using CPU.

Read more…

Neural Network Distillation

Published:

Overview of neural network distillation as done in “Distilling the Knowledge in a Neural Network” (Hinton et al, 2014).

Read more…

Transformer-decoder language models

Published:

Some notes on how transformer-decoder language models work, taking GPT-2 as an example, and with lots references in order to dig deeper.

Read more…

Stable Diffusion on an AMD Ryzen 5 5600G

Published:

Executing the Stable Diffusion text-to-image model on an AMD Ryzen 5 5600G integrated GPU (iGPU).

Read more…

Page 1 of 1 | | | JSON Feed | Atom Feed