How to Blog With TeX4ht
This post is part of a series on how to set up TeX4ht, the LaTeX to XML converter, for use with Static Site Generators. In this article, we’ll discuss how to configure it to produce suitable HTML.
Contents
2 Copy the generated files to the static site
3 Automatic compilation of changed LaTeX files
3.1 rebuild.sh Script Functionality
1 Static site extension for make4ht
The conversion process used by TeX4ht is quite complex. It requires compiling a LaTeX file to a DVI file with special instructions inserted by the tex4ht.sty package. This DVI file is then processed by the tex4ht command, which produces HTML or XML files, along with instructions for the final command, t4ht, which generates CSS files and images.
Traditionally, this process was handled by the htlatex script, but it had many weaknesses. The currently recommended build tool is make4ht. You can find some details about the differences between htlatex and make4ht in the make4ht documentation.
Among the features provided by make4ht are Lua build files, post-processing filters, and extensions. We can use these features to transform HTML files produced by TeX4ht into the format required by static site generators.
Filters can clean up the generated files and fix common issues that are difficult to address at the TeX level. They can be applied either from Lua build files or using make4ht extensions.
make4ht provides an extension that specifically supports static site generators. Let’s demonstrate its usage with a simple example:
\documentclass{article} \begin{document} \title{Hello world test} \author{Michal} \maketitle This is my test post. \end{document}
You can use the following command to generate a file suitable for static site generators:
make4ht -f html5+staticsite filename.tex
By default, the staticsite extension produces a file named as
YYYY-MM-DD-<filename>
, so this example might be named 2021-07-25-filename.html.
It’s not an ordinary HTML file, but it contains a YAML header with document
metadata:
--- meta: - charset: ’utf-8’ - name: ’generator’ content: ’TeX4ht (https://tug.org/tex4ht/)’ - name: ’viewport’ content: ’width=device-width,initial-scale=1’ - name: ’src’ content: ’2021-07-18-hello-world.tex’ time: 1626619562 updated: 1627244699 styles: - ’2021-07-18-hello-world.css’ title: ’Hello world test’ --- <p class=’indent’> This is my test post. </p>
Although most static site generators expect Markdown, they also accept HTML files in this format. When staticsite is used for the first time, it creates a file with a .published extension. This file contains a timestamp of the moment it was first used. This timestamp is then used for the date part of the generated filename.
2 Copy the generated files to the static site
The staticsite extension can copy the generated files to the locations where the static site generator expects to find files to process.
Let’s say we have the following directory structure, suitable for the Jekyll static site generator:
blog/ .. texposts_root/ .... first_post/ ...... first_post.tex .... second_post/ ...... second_post.tex .. docs/_posts/ .. .make4ht
The blog’s main directory contains the file .make4ht, and two directories:
texposts_root
and docs/_posts
. Jekyll has built-in support for blogs. It uses all
HTML documents contained within the _posts
subdirectory. We’ll then use the docs
directory as the source directory for GitHub Pages.
The source LaTeX files are stored in subdirectories of texposts_root
. We
want to automatically copy the generated HTML files to docs/_posts/
.
The staticsite extension can be configured to do this using the .make4ht
configuration file. This file is used to pass shared configuration to make4ht, such as
specifying that all generated files should be copied to the docs/_posts/
directory.
The basic format of the .make4ht file necessary for the staticsite extension can look like this:
filter_settings "staticsite" { site_root = "../../docs/_posts/" header = { layout="post", }, } if mode=="publish" then Make:enable_extension "staticsite" Make:htlatex {} Make:htlatex {} end
The filter_settings
function passes a table with settings for the extension. The
site_root
field specifies the path to the directory for the generated files. It can be
specified in a relative form, as in this example. Two ..
are necessary because the
output directory is located two levels up in the directory hierarchy from the directory
of the compiled TeX file.
We also specify the build sequence for site generation. If we pass the
--mode publish
option to make4ht, the staticsite extension will be enabled, and
LaTeX will be executed twice. This is important because the contents of the \title
and \author
commands are only available in the second LaTeX run. They are then
included in the YAML header.
You can now execute the following command in the texposts_root/first_post
directory:
make4ht -m publish first_post.tex
This will automatically load the staticsite extension, thanks to our .make4ht
file, so it’s not necessary to enable it on the command line. The generated HTML and
CSS files will be placed in the docs/_posts/
directory.
In the next post, we will look at how to use this setup with Jekyll to create a simple blog.
The .make4ht file provided in this blog repository also adds a new function that
writes a <input>.published
file. This file is used by the staticsite extension to
find the original publication date of a post. You should add the published file to
your source repository so the correct date is used in future updates of the
site.
3 Automatic compilation of changed LaTeX files
Instead of compiling documents manually after each change, you can automate the build process using the siterebuild script.
This tool checks all TeX files in your document directory tree for changes and lists only the modified files. This is especially important as your blog grows, as it would be wasteful to compile all source files on every update.
3.1 rebuild.sh Script Functionality
I’ve provided a shell script named rebuild.sh, which is included in the TeX files root directory. It uses siterebuild to automatically compile the changed TeX files:
#!/bin/sh if ! command -v siterebuild &> /dev/null then SITEREBUILD=../siterebuild/siterebuild else SITEREBUILD=siterebuild fi export TEXINPUTS=.:/root/texmf//: $SITEREBUILD -l debug # we use the custom output format for siterebuild, to be able to easily extract directory and filename in the later steps for i in ‘$SITEREBUILD -o %dir@%file‘ do texdir=‘echo $i | cut -d@ -f1 -‘ texfile=‘echo $i | cut -d@ -f2 -‘ # either execute Makefile, or run make4ht directly cd "$texdir" if test -f Makefile; then make else TEXINPUTS=.:/root/texmf//: make4ht -a debug -m publish -l "$texfile" fi cd .. done
It first checks for the availability of the siterebuild command, using either a system-wide installation or a local version from the repository. The script then executes siterebuild with debug logging to identify modified LaTeX files that require recompilation.
Using a custom output format (%dir@%file
), it extracts both the directory path
and filename for each changed document. For each detected file, the script navigates
to the corresponding directory and checks for a Makefile. If one is present, it uses
the existing build system; otherwise, it directly invokes make4ht with publishing
options to generate the HTML output.
If you want to change the options for make4ht used for the compilation, you can edit the rebuild.sh file. If you want to change options for only one file, you can create a Makefile in that file’s directory and put the necessary compilation commands in that Makefile.