My document generation workflow with Markdown, YAML, Jinja2 and WeasyPrint

I'm not a super fan a WISWYG text editors. They never really do what I want them to and often often do what I don't whan them to. Here's the workflow I'm using to generate simple text documents (resume, cover letters, etc.) from Markdown, YAML and Jinja2 templates.

Summary:

  1. input document is in Markdown with YAML frontmatter
  2. HTML conversion using a Jinja2 template
  3. PDF conversion from HTML with WeasyPrint

Good-old make coordinates the different steps.

The nice things about this approach is that:

  • you write your content in Markdown;
  • you can add structured data in YAML;
  • you can use your CSS skills to style the document;
  • it's VCS friendly (usable diff, merge, etc.);
  • you can easily share the Jinja template and CSS style between documents.

Input file

The input of the document is a Markdown file with frontmatter and looks like that:

name: John Doe
title: Super hero
address:
  - 221B Baker Street
  - London
  - UK
lang: en
phone: +XX-X-XX-XX-XX-XX
email: john.doe@example.com
website: http://www.example.com/john.doe/
---
## Introduction

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat
non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.


## Discussion

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat
non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.



## Conclusion

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat
non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

HTML conversion

I'm using a Jinja2 template to convert the input Markdown document into HTML:

<html xmlns="http://www.w3.org/1999/xhtml" lang="{{ lang | escape}}">
<head>
  <meta charset="utf-8"/>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
  <title>{{ title | escape }}</title>
  <link rel="stylesheet" type="text/css" href="style.css"/>
</head>
<body>

<header>
  <address>
    <strong>{{ name | escape }}</strong><br/>
    {% for line in address %}
    {{ line | escape }}<br/>
    {% endfor %}
  </address>
  <span class="details">
    <p><span title="{{ 'Téléphone' if lang == 'fr' else 'Phone' }}">☎</span>
      <a href="tel:{{ phone | escape}}">{{ phone | escape }}</a></p>
    <p><a href="mailto:{{ email | escape }}">{{ email | escape }}</a></p>
    <p><a href="{{ website | escape }}">{{ website | escape }}</a></p>
  </span>
</header>

<h1>{{ title | escape }}</h1>
{{ body }}

</body>
</html>

The conversion is done with a Python script (./render):

#!/usr/bin/env python3

from sys import argv
import re

import yaml
import markdown
from markdown.extensions.extra import ExtraExtension
from jinja2 import Environment, FileSystemLoader
from atomicwrites import atomic_write

EXT = [
    ExtraExtension()
]

env = Environment(
    loader=FileSystemLoader('.'),
    autoescape=False,
)
template = env.get_template('template.j2')

filename = argv[1]
out_filename = argv[2]

RE = re.compile(r'^---\s*$', re.M)


def split_document(data):
    """
    Split a document into a YAML frontmatter and a body
    """
    lines = str.splitlines(data)
    if not RE.match(lines[0]):
        raise Exception("Missing YAML start")
    for i in range(1, len(lines)):
        if RE.match(lines[i]):
            head_raw = "\n".join(lines[:i+1])
            head = list(yaml.load_all(head_raw))[0]
            body = "\n".join(lines[i+2:])
            return (head, body)
    raise Exception("Missing YAML end")


with open(filename, "r") as f:
    content = f.read()
(head, body) = split_document(content)
body_html = markdown.markdown(body, extensions=EXT)
with atomic_write(out_filename, overwrite=True) as f:
    f.write(template.render(**head, body=body_html))

Called as:

./render doc.md doc.html

PDF conversion

I'm using WeasyPrint to generate PDF from HTML:

weasyprint doc.html doc.pdf

WeasyPrint has some support for page CSS:

@page {
  size: A4;
  margin: 1cm;
  margin-top: 2cm;
  margin-bottom: 2cm;
}

@media print {
  body {
    margin-top: 0;
    margin-bottom: 0;
  }
}

h1, h2, h3 {
  page-break-after: avoid;
  page-break-inside: avoid;
}

li {
  page-break-inside: avoid;
}

It has support for links, PDF bookmarks, attachements, fonts, etc.

Make

Currently I'm using a Makefile to compose the different steps:

.PHONY: all clear

all: doc.html
clear:
    rm doc.html

doc.html: doc.md template.j2 render
    ./render doc.md doc.html

doc.pdf: doc.html
    weasyprint doc.html doc.pdf