Escaping WordPress: Migrating to Zola

I’ve had Wordpress sites on and off fotr about a decade. Quite a few ppsts of “Here we go again” as I tried to kick life into it. It did the job, but I was paying a hosting provider to run PHP and MySQL so I could serve what amounted to a few hundred kilobytes of text. That felt increasingly stupid. So I moved to Zola — a static site generator written in Rust that builds the whole site in milliseconds, serves it as flat files, and costs essentially nothing to host on Cloudflare Pages.

This post covers the actual process: getting content out of WordPress, converting it to something Zola can use, and building a new Zola site from scratch with the tabi theme. If you’re considering the same move, this should save you some time.

Getting the content out of WordPress

I started with a plain WP export, but couldn’t make much sense of the content. I know I wanted .md files & found the WordPress to Jekyll Exporter plugin would get me there. Install it, run the export, and you get a zip of .md files with YAML frontmatter. The bodies aren’t perfect — you’ll see residual HTML,   entities, and WordPress block editor comments — but the text is there, and that’s what matters.

The catch is that these files have Jekyll-style YAML frontmatter, not the TOML frontmatter Zola expects. A typical exported file looks like this (one of my first - perhaps cringeworthy - posts):

---
id: 9
title: 'Thanks – Mr Hollins'
date: '2014-04-25T22:20:56+01:00'
author: tom
excerpt: ''
layout: post
guid: 'http://example.wordpress.com/2014/04/25/thanks-mr-hollins/'
permalink: /2014/04/25/thanks-mr-hollins/
categories:
- Uncategorized
tags:
- gardening
- thanks
format: false
---

What Zola wants is:

+++
title = "Thanks – Mr Hollins"
date = 2014-04-25
draft = false

[taxonomies]
tags = ["gardening", "thanks"]
+++

I started thinking about sed but took the easy path and asked Claude to knock me up a Python script. Yes, really.

The conversion script

This Python script reads each .md file, parses the YAML frontmatter, and rewrites it as TOML. It also decodes HTML entities in titles (so – becomes an actual en-dash) and normalises dates to YYYY-MM-DD.

#!/usr/bin/env python3
"""Convert Jekyll-style YAML frontmatter to Zola TOML frontmatter."""
import frontmatter
import html
import pathlib
import sys

POSTS_DIR = pathlib.Path(sys.argv[1] if len(sys.argv) > 1 else "posts")

def convert(path: pathlib.Path) -> None:
    post = frontmatter.load(path)
    meta = post.metadata

    title = html.unescape(str(meta.get("title", "")))
    date = meta.get("date")
    date_str = date.strftime("%Y-%m-%d") if hasattr(date, "strftime") else str(date)[:10]

    tags = meta.get("tags") or []
    if isinstance(tags, str):
        tags = [tags]
    tags_toml = ", ".join(f'"{t}"' for t in tags)

    lines = [
        "+++",
        f'title = "{title}"',
        f"date = {date_str}",
        "draft = false",
    ]
    if tags:
        lines += ["[taxonomies]", f"tags = [{tags_toml}]"]
    lines += ["+++", ""]

    new_content = "\n".join(lines) + post.content
    path.write_text(new_content, encoding="utf-8")
    print(f"converted {path.name}")

for md in POSTS_DIR.glob("*.md"):
    try:
        convert(md)
    except Exception as e:
        print(f"FAILED {md.name}: {e}", file=sys.stderr)

Found pretty quicky that pip install python-frontmatter is the answer, the frontmatter lib is completely unrelated. yay.

Run it against a copy of your posts directory:

python3 convert.py /path/to/posts

Draft status is hardcoded to false because the Jekyll exporter only exports published posts — drafts don’t make it into the export. Categories are deliberately dropped; in my case they were all “Uncategorized” and added no value. If yours are meaningful, extending the script to include them under [taxonomies] is straightforward.

If any of your titles contain literal double quotes, they’ll break the TOML. A grep '"' *.md | grep title: will tell you if you need to worry about this. If you do, escape them or switch to TOML’s triple-quoted strings.

Cleaning up filenames

The Jekyll exporter names files YYYY-MM-DD-slug.md because Jekyll requires it. Zola doesn’t — the date lives in the frontmatter, and the filename is just the URL slug. Strip the date prefix:

cd posts
for f in 20*-*.md; do mv "$f" "${f:11}"; done

That chops the first 11 characters (YYYY-MM-DD-) from each filename.

Setting up the directory structure

Zola supports two layouts for posts: flat files (content/blog/my-post.md) and colocated directories (content/blog/my-post/index.md). I fingured I may want to have images, so I standardised on the dir-per-post apprioach, and thi needed another courtesy-of-Claude bit of bash. The really sad thimg is that I know this is easy to write and I would at work, but it was the evening, so….

cd content/blog
for f in *.md; do
  [ "$f" = "_index.md" ] && continue
  dir="${f%.md}"
  mkdir "$dir"
  mv "$f" "$dir/index.md"
done

Installing Zola and setting up a site

Zola can be installed by snap, but that’s not my cup of tea so I grabbed the binary from the Zola GitHub releases page

Create a new site:

zola init my-site
cd my-site
git init

Adding the tabi theme

I went with tabi - looks nice, seems to be maintained, and the example sites looked okay, Text’s a little large for me, but that’s fixable.

Added it as a git submodule:

git submodule add https://github.com/welpo/tabi.git themes/tabi

Using a submodule means theme updates are a git submodule update --remote away, and the theme doesn’t pollute your repo’s history. Doenside is that I think I have to have the “powered by zola/tabi” footer thing but that’s okay. I can’t really begrudge them that for something free and this good.

Set theme = "tabi" in your config.toml. Tabi’s example config.toml is a good start. Lots of commets in there and you can cut what you don’t need.

Copy tabi’s example _index.md files for the homepage and your blog section into your own content/ directory:

content/
  _index.md              ← homepage
  blog/
    _index.md            ← section listing page
    thanks-mr-hollins/
      index.md           ← your migrated post
    first-analysis/
      index.md

Then:

zola serve

Open http://127.0.0.1:1111/ and you should see your site with your migrated posts listed.

After the migration

Some things that will need attention:

Body content cleanup. The old posts are a bit of a mess internally and need a bit of TLC. <!-- wp:paragraph --> comments, stray <div> tags, &nbsp; entities, smart-quote encoding issues. Some of th code snippets are not quite right - it’s just formatting though, a bit of sed and a keen pair of eyes will do the job.

Internal links. Most of my posts are stana;lone, but the few internal links are proken. Easy fix, linked to each other using WordPress-style URLs (/2014/04/25/thanks-mr-hollins/), those links are now broken. grep -r '/20[0-9][0-9]/' content/ will find them. Rewrite them to Zola’s internal link format: @/blog/thanks-mr-hollins/index.md. Then zola check will catch any you missed.

Font size. tabi ships with fairly generous typography. If it feels too large on desktop, the quickest fix is a root font-size override. Create a custom CSS file in static/, and find the bit in the config.toml that enables the cyustom .css set :root { font-size: 95%; } — everything in the theme is sized in rem so this scales proportionally.

TaxonomiesCategories Make sure your config.toml declares the taxonomies you’re using:

taxonomies = [
  { name = "tags", feed = true },
]
categories = [
  { name = "tags", feed = true },
]

Without these Zola just won’t know what you’re doing and you won’t have either.

What you end up with

A directory of plain-text markdown files in a git repo. No database, no PHP, no hosting costs beyond a domain registration. The site builds in milliseconds, deploys automagically via Cloudflare Pages on push, and the content is portable to any other static site generator if you ever want to move again.

The whole migration — export, convert, restructure, build a new site, deploy — took an evening. Most of that was fiddling with theme configuration rather than dealing with the content itself. The content, it turns out, was the easy part. Just as it always should have been.