/dev/posts/

My document generation workflow with Markdown, YAML, Jinja2 and WeasyPrint

Published:

Updated:

Here is the workflow I am using to generate simple text documents (resume, cover letters, etc.) from Markdown, YAML and Jinja2 templates.

Summary:

  1. input document is in Markdown with YAML frontmatter
  2. HTML conversion using a Jinja2 template
  3. PDF conversion from HTML with WeasyPrint

Good-old make coordinates the different steps.

The nice things about this approach is that:

Input file

The input of the document is a Markdown file with frontmatter and looks like that:

name: John Doe
title: Super hero
address:
  - 221B Baker Street
  - London
  - UK
lang: en
phone: +XX-X-XX-XX-XX-XX
email: john.doe@example.com
website: http://www.example.com/john.doe/
---
## Introduction

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat
non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.


## Discussion

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat
non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.



## Conclusion

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat
non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

HTML conversion

I am using a Jinja2 template to convert the input Markdown document into HTML:

<html xmlns="http://www.w3.org/1999/xhtml" lang="{{ lang | escape}}">
<head>
  <meta charset="utf-8"/>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
  <title>{{ title | escape }}</title>
  <link rel="stylesheet" type="text/css" href="style.css"/>
</head>
<body>

<header>
  <address>
    <strong>{{ name | escape }}</strong><br/>
    {% for line in address %}
    {{ line | escape }}<br/>
    {% endfor %}
  </address>
  <span class="details">
    <p><span title="{{ 'Téléphone' if lang == 'fr' else 'Phone' }}">☎</span>
      <a href="tel:{{ phone | escape}}">{{ phone | escape }}</a></p>
    <p><a href="mailto:{{ email | escape }}">{{ email | escape }}</a></p>
    <p><a href="{{ website | escape }}">{{ website | escape }}</a></p>
  </span>
</header>

<h1>{{ title | escape }}</h1>
{{ body }}

</body>
</html>

The conversion is done with a Python script (./render):

#!/usr/bin/env python3

from sys import argv
import re

import yaml
import markdown
from markdown.extensions.extra import ExtraExtension
from jinja2 import Environment, FileSystemLoader
from atomicwrites import atomic_write

EXT = [
    ExtraExtension()
]

env = Environment(
    loader=FileSystemLoader('.'),
    autoescape=False,
)
template = env.get_template('template.j2')

filename = argv[1]
out_filename = argv[2]

RE = re.compile(r'^---\s*$', re.M)


def split_document(data):
    """
    Split a document into a YAML frontmatter and a body
    """
    lines = str.splitlines(data)
    if not RE.match(lines[0]):
        raise Exception("Missing YAML start")
    for i in range(1, len(lines)):
        if RE.match(lines[i]):
            head_raw = "\n".join(lines[:i+1])
            head = list(yaml.load_all(head_raw))[0]
            body = "\n".join(lines[i+2:])
            return (head, body)
    raise Exception("Missing YAML end")


with open(filename, "r") as f:
    content = f.read()
(head, body) = split_document(content)
body_html = markdown.markdown(body, extensions=EXT)
with atomic_write(out_filename, overwrite=True) as f:
    f.write(template.render(**head, body=body_html))

Called as:

./render doc.md doc.html

PDF conversion

I am using WeasyPrint to generate PDF from HTML:

weasyprint doc.html doc.pdf

WeasyPrint has some support for page CSS:

@page {
  size: A4;
  margin: 1cm;
  margin-top: 2cm;
  margin-bottom: 2cm;
}

@media print {
  body {
    margin-top: 0;
    margin-bottom: 0;
  }
}

h1, h2, h3 {
  page-break-after: avoid;
  page-break-inside: avoid;
}

li {
  page-break-inside: avoid;
}

It has support for links, PDF bookmarks, attachements, fonts, etc.

Make

Currently I am using a Makefile to compose the different steps:

.PHONY: all clear

all: doc.html
clear:
	rm doc.html

doc.html: doc.md template.j2 render
	./render doc.md doc.html

doc.pdf: doc.html
	weasyprint doc.html doc.pdf