How to Blog With TeX4ht
This post is a part of series on how to set up TeX4ht, LaTeX to XML converter for use with Static Site Generators. We will discuss how to configure it to produce suitable HTML in this article.
Contents
2 Copy the generated files to the static site
3 Automatic compilation of the changed LaTeX files
1 Static site extension for make4ht
Conversion process used by TeX4ht is quite complex. It needs to compile LaTeX file to a DVI file with special instructions inserted by tex4ht.sty package. This DVI file is then processed by tex4ht command, that produces HTML or XML files, and instructions for the last command, t4ht, which produces CSS files and pictures.
Traditionally, this process was handled by the htlatex script, but it had many weaknesses, so the currently recommended build tool is make4ht. You can find some details about htlatex and make4ht differences in the make4ht documentation.
Among features provided by make4ht are Lua build files, post-processing filters, and extensions. We can use these features to transform HTML files produced by TeX4ht to form required by static site generators.
Filters can clean-up the generated files, and fix some common issues that are hard to fix on the TeX level. They can be applyed either from Lua build files, or using make4ht extensions.
make4ht provides an extension that aims at support for static site generators. We can show the usage on a simple example:
\documentclass{article} \begin{document} \title{Hello world test} \author{Michal} \maketitle This is my test post. \end{document}
You can use a following command to generate file suitable for static site generators:
make4ht -f html5+staticsite filename.tex
By default, staticsite extension produces file named as YYYY-MM-DD-<filename>
,
this example can be named as 2021-07-25-filename.html. It is not ordinary HTML
file, but it contains YAML header with document metadata:
--- meta: - charset: ’utf-8’ - name: ’generator’ content: ’TeX4ht (https://tug.org/tex4ht/)’ - name: ’viewport’ content: ’width=device-width,initial-scale=1’ - name: ’src’ content: ’2021-07-18-hello-world.tex’ time: 1626619562 updated: 1627244699 styles: - ’2021-07-18-hello-world.css’ title: ’Hello world test’ --- <!-- l. 7 --><p class=’indent’> This is my test post. </p>
Although most static site generators expects Markdown, they also accept HTML files in this form. When staticsite is used for the first time, it creates file with a .published extension. It contains timestamp of the moment when it was used for the first time. This timestamp is used for the date part of the generated filename.
2 Copy the generated files to the static site
The staticsite extension can copy the generated files to places where the static site generator expects files to process.
Let’s say, that we have the following directory structure:
blog/ .. texposts_root/ .... first_post/ ...... first_post.tex .... second_post/ ...... second_post.tex .. html_root/ .. .make4ht
The blog’s main directory contains file .make4ht, and two directories:
texposts_root
and html_root
.
The source LaTeX files are stored in subdirectories of texposts_root
. We want
to copy the generated HTML files to html_root
automatically. The staticsite
extension can be configured to do that using the .make4ht configuration file.
This file is meant for passing of shared configuration to make4ht, like in this
case, where we want to specify to copy all generated files to the html_root
directory.
The basic format of the .make4ht file necessary for the staticsite extension can look like this:
filter_settings "staticsite" { site_root = "path/to/blogging_engine/html_dir" header = { layout="post", }, } if mode=="publish" then Make:enable_extension "staticsite" Make:htlatex {} Make:htlatex {} end
The filter_settings
function passes table with settings for the extension. The
site_root
field specifies path to the generated files directory. It can be specified in
the relative form, as in the example. Two ..
are necessary, as the output directory is
placed two levels up in the directory hierarchy.
We also specify the build sequence for the site generation. If we pass the
--mode publish
option to make4ht, the staticsite extension will be enabled, and
LaTeX will be executed twice. This is important, because the contents of \title
and
\author
commands are available only in the second LaTeX run. They are then
available in the YAML header.
You can now execute the following command in the texposts_root/first_post
directory:
make4ht -m publish first_post.tex
This will load the staticsite automatically, thanks to our .make4ht file, so it is
not necessary to enable it on the command line. The generated HTML and CSS files
will be placed in the html_root
directory.
We will take a look at how to use this setup together with Jekyll to create a simple blog in the next post.
The .make4ht file provided in this blog repository allso adds a new function that
will write a <input>.pubslished
file. This file is used by by the staticsite
extension to find the original date of the publication of a post. You should add the
pubslished file to your source repository, so the correct date of the publication is
used in next updates of the site.
3 Automatic compilation of the changed LaTeX files
Instead of compiling the documents manually after each change, you can automate the build process using the siterebuild script.
This tool finds all TeX files in your document directory tree for changes. It lists only the changed files. This is especially important if your blog grows up, as it would be wastefull to compile all source files on each update.
I’ve provided a shell script named rebuild.sh, included in the TeX files root directory. It uses siterebuild and compiles the changed TeX files automatically.
If you want to change options for make4ht used for the compilation, you can edit this file. If you want to change options only for one file, you can create a Makefile in the directory of that file and put commands necessary for the compilation in that Makefile.