How to Blog With TeX4ht
This post is a part of series on how to set up TeX4ht, LaTeX to XML converter for use with Static Site Generators. We will discuss how to configure it to produce suitable HTML in this article.
Contents
1 Static site extension for make4ht
Conversion process used by TeX4ht is quite complex. It needs to compile LaTeX file to a DVI file with special instructions inserted by tex4ht.sty package. This DVI file is then processed by tex4ht command, that produces HTML or XML files, and instructions for the last command, t4ht, which produces CSS files and pictures.
Traditionally, this process was handled by the htlatex script, but it had many weaknesses, so the currently recommended build tool is make4ht. You can find some details about htlatex and make4ht differences in the make4ht documentation.
Among features provided by make4ht are Lua build files, post-processing filters, and extensions. We can use these features to transform HTML files produced by TeX4ht to form required by static site generators.
Filters can clean-up the generated files, and fix some common issues that are hard to fix on the TeX level. They can be applyed either from Lua build files, or using make4ht extensions.
make4ht provides an extension that aims at support for static site generators. We can show the usage on a simple example:
\documentclass{article} \begin{document} \title{Hello world test} \author{Michal} \maketitle This is my test post. \end{document}
You can use a following command to generate file suitable for static site generators:
make4ht -f html5+staticsite filename.tex
By default, staticsite extension produces file named as YYYY-MM-DD-<filename>, this example can be named as 2021-07-25-filename.html. It is not ordinary HTML file, but it contains YAML header with document metadata:
--- meta: - charset: ’utf-8’ - name: ’generator’ content: ’TeX4ht (https://tug.org/tex4ht/)’ - name: ’viewport’ content: ’width=device-width,initial-scale=1’ - name: ’src’ content: ’2021-07-18-hello-world.tex’ time: 1626619562 updated: 1627244699 styles: - ’2021-07-18-hello-world.css’ title: ’Hello world test’ --- <!-- l. 7 --><p class=’indent’> This is my test post. </p>
Although most static site generators expects Markdown, they also accept HTML files in this form. When staticsite is used for the first time, it creates file with a .published extension. It contains timestamp of the moment when it was used for the first time. This timestamp is used for the date part of the generated filename.
2 Copy the generated files to the static site
The staticsite extension can copy the generated files to places where the static site generator expects files to process.
Let’s say, that we have the following directory structure:
blog/ .. texposts_root/ .... first_post/ ...... first_post.tex .... second_post/ ...... second_post.tex .. html_root/ .. .make4ht
The blog’s main directory contains file .make4ht, and two directories: texposts_root and html_root.
The source LaTeX files are stored in subdirectories of texposts_root. We want to copy the generated HTML files to html_root automatically. The staticsite extension can be configured to do that using the .make4ht configuration file. This file is meant for passing of shared configuration to make4ht, like in this case, where we want to specify to copy all generated files to the html_root directory.
The basic format of the .make4ht file necessary for the staticsite extension can look like this:
filter_settings "staticsite" { site_root = "path/to/blogging_engine/html_dir" header = { layout="post", }, } if mode=="publish" then Make:enable_extension "staticsite" Make:htlatex {} Make:htlatex {} end
The filter_settings function passes table with settings for the extension. The site_root field specifies path to the generated files directory. It can be specified in the relative form, as in the example. Two .. are necessary, as the output directory is placed two levels up in the directory hierarchy.
We also specify the build sequence for the site generation. If we pass the --mode publish option to make4ht, the staticsite extension will be enabled, and LaTeX will be executed twice. This is important, because the contents of \title and \author commands are available only in the second LaTeX run. They are then available in the YAML header.
You can now execute the following command in the texposts_root/first_post directory:
make4ht -m publish first_post.tex
This will load the staticsite automatically, thanks to our .make4ht file, so it is not necessary to enable it on the command line. The generated HTML and CSS files will be placed in the html_root directory.
We will take a look at how to use this setup together with Jekyll to create a simple blog in the next post.