Chapter 2
Basic Tutorial

2.1 What is TeX4ht?

TeX4ht is a system that converts LaTeX to various output formats, including HTML, ODT, DocBook or TEI. HTML and ODT formats are the most common and best-supported conversion targets.

TeX4ht allows authors to convert LaTeX input to several output formats, like HTML (for web pages) or ePub (for ebooks and other applications).

2.2 Basic Usage

Conversion is invoked using the make4ht command:

$ make4ht filename.tex

Let us start with conversion of a simple LaTeX file to HTML with the following LaTeX file:

\documentclass{article} 
\usepackage[czech]{babel} 
\begin{document} 
Příliš žluťoučký kůň úpěl ďábelské ódy. 
\end{document}

You can compile it using the following command:

$ make4ht -lm draft filename.tex

The resulting HTML file contains the following code:

<!--  l. 4  --><p class='noindent'>Příliš žluťoučký kůň úpěl ďábelské ódy. 
</p>

As you can see, multiple options can be joined for make4ht. The above invocation is equal to the following:

$ make4ht -l -m draft filename.tex

You can also use the long options:

$ make4ht --lua --mode draft filename.tex

What do these options mean?

The option --lua tells make4ht to use LuaLaTeX as a compilation engine. There is also an option -x (or --xetex) that allows the use of XeLaTeX for the compilation. If neither of these options is used, the file will be converted with the default PDFLaTeX engine.

For example, if we compile the sample file without the -l option, we would get a different result:

<!--  l. 4  --><p class='noindent'>Příliš žlut’oučký kůň úpěl d’ábelské ódy. 
</p>

Notice that the accents in the ť and ď letters are detached from the base letters. That is because TeX4ht uses information about characters in the DVI file. Current LaTeX supports basic accented characters out of the box, but sometimes, they don’t work as expected.

The option --mode sets the compilation mode. make4ht has one built-in mode, named draft. By default, make4ht compiles your TeX file three times, to obtain the correct hyperlinks and other features that depend on auxilary files. The draft mode uses only one compilation run, so it is much faster.

make4ht converts LaTeX file to a HTML 5 document. You can request conversion to other formats using the -f option. For example, to convert a document to the OpenDocument Format, use the following:

$ make4ht -f odt filename.tex

2.2.1 Debugging

More information about make4htand its command line options and other features can be found in section make4ht Build System.

2.3 TeX4ht Options

The simplest way to change some aspects of the design is to use TeX4ht options. They can be passed as a first positional argument after filename to make4ht:

$ make4ht filename.tex "option1,option2"

For example, TeX4ht produces one HTML file for a document, but each footnote is placed in a separate file. If you have a large document, you may want to use a separate page for each chapter, with a list of footnotes at the end of these chapters. You can use the following options:

$ make4ht filename.tex "3,sec-filename,fn-in"

There are other numeric options, each of them breaks document into separate HTML pages on a different sectioning level. Option 1 does not break pages at all, 2 at parts, 3 at chapters, 4 at sections, 5 at subsections, 6 at sub-subsections, and 7 at paragraphs. The sec-filename option will produce HTML file names that are based on section titles, instead of their numbers. The fn-in option prints footnotes at the end of each HTML page.

There are also options that change the handling of math. Normally, HTML elements are used for simple math, and pictures are used for more complex features, such as fractions or square roots. This usually does not look good, so what are other options?

Generally, it is best to use MathML, as it supports correct vertical alignment for inline math, and the font size matches the surrounding text. Unfortunately, some web browsers do not support it yet. We can use MathJax to render math in these browsers.

$ make4ht filename.tex "mathml,mathjax"

On the other hand, if you want to use pictures for math exclusively, you can try the pic-m option, which requires pictures even for inline math. There are also similar options for equations and other math environments.

$ make4ht filename.tex "pic-m,pic-equation"

The generated pictures are in the PNG format, which is raster and depends on the resolution on the device where the document is displayed. You may want to use vector SVG format instead, as it should produce better quality of pictures:

$ make4ht filename.tex "pic-m,pic-equation,svg"

For more information on options, see chapter TeX4htOptions.

2.4 make4ht extensions

make4ht has an extension support. These extensions can modify various aspects of the conversion process, for example, post-process the generated files, cache images, or add support for Rmarkdown files. Extensions can be enabled using the -f format_name+extension_name option.

For example, there is a preprocess_input extension, which adds support for Markdown or Rtex documents. It can process a following Rmarkdown document:

This is *Rmarkdown* example. Today is `r Sys.Date() `. 

Compile it using the following command:

$ make4ht -f html5+preprocess_input sample.Rmd

It producess a following HTML file:

<!--  l. 66  --><p class='noindent'>This is <span class='ec-lmri-10'>Rmarkdown </span>example. Today is 2022-03-24. 
</p>

If your document produces many pictures, the compilation can take a long time. To make it faster, you can use the dvisvgm_hashes extension. It caches the SVG images and creates them only for the changed math environments.

$ make4ht -f html5+dvisvgm_hashes filename.tex "pic-m,pic-equation,svg"

make4ht loads the common_domfilters extension automatically. It fixes common issues in the generated HTML files using the LuaXML package. To disable extension from loading, use -extension_name syntax:

$ make4ht -f html5-common_domfilters filename.tex

You can find a list of extensions in make4ht documentation.

2.5 Configurations

Most of the markup produced by TeX4ht is configurable. Supported commands can be configured using the \Configure command. We can also insert markup before and after environments, using \ConfigureEnv command.

While it is possible to insert these commands directly to your document, it is better to use a custom configuration file, as you would get a compilation error if you compiled document containing TeX4ht commands directly by LaTeX.

You can find more information about syntax and available commands in section Private Configuration Files. Here, we will show some simple examples.

2.5.1 The \Configure command

2.5.2 Configuring Environments

You may want to insert some custom HTML tags. It is a bit more complicated for LaTeXcommands, but it is easy for environments. You can configure the code that is inserted before and after environment using the \ConfigureEnv command. It has a following syntax:

\ConfigureEnv{<environment name>}{before env}{after env} 
{before-list}{after-list}

We can ignore the arguments before-list and after-list, as they are used only for list like environments, such as itemize. So we just need to to pass code that will be inserted in the before env and after env arguments.

2.6 Remains of the old tutorial

The following text was imported from the original TeX4ht tutorial and needs to be rewritten. It still contains some useful information, but there are also some obsolete pieces.

But beware of the following situation:

Hello world. 
\begin{someenv} 
Just start some environment. 
 
But run it through several paragraphs 
\end{someenv}

say that we insert <div class="someenv"> and </div> tags around the someenv environment. By default this may produce following structure:

<p>Hello world. 
<div class="someenv">Just start some environment. 
</p> 
 
<p>But run it through several paragraphs 
</div></p>

as you can see, generated html code is incorrect, as opening and closing <div> tags have different parent elements. someenv environment can be configured to close current paragraph, but it may be not what you want.

Best way to prevent tag mismatch may be something like:

Hello world. 
\begin{someenv} 
Just start some environment. 
\end{someenv} 
 
\begin{someenv} 
But run it through several paragraphs 
\end{someenv}

and with make4ht

make4ht sample1

lets look on text part generated by htlatex:

<!--l. 6--><p class="noindent" >P&#x0159;íli&#353; &#382;lu&#x0165;ou&#x010D;k&#x00FD; k&#x016F;&#x0148; úp&#x011B;l <span 
class="ecti-1000">&#x010F;</span><span 
class="ecti-1000">ábelsk</span><span 
class="ecti-1000">é </span>ódy. Some text in English

and by make4ht:

<!--l. 6--><p class="noindent" >P&#x0159;íli&#353; &#382;lu&#x0165;ou&#x010D;k&#x00FD; k&#x016F;&#x0148; úp&#x011B;l <span 
class="ecti-1000">&#x010F;</span><span 
class="ecti-1000">ábelsk</span><span 
class="ecti-1000">é </span>ódy. Some text in English 
</p>

only difference is missing </p> tag in output of htlatex, because html 4.01 is produced by htlatex by default. make4ht on the other hand produces xhtml by default, so closing tag must be presented.

To get xhtml output from htlatex, use tex4ht.sty option xhtml. This option must be first option in the option list passed to tex4ht.sty. Value of the first option must be either html, xhtml or name of custom config file. We will cover these config files later, as they are key component in customization of TeX4ht output.

So in order to get same output as from make4ht, we must use following command:

htlatex sample1 xhtml

Now we should get rid of ugly entities which encode accented letters. This is somewhat ugly with htlatex:

htlatex sample1 "xhtml,charset=utf-8" " -cunihtf -utf8"

charset=utf-8 produces meta element which declares document to be in utf-8 encoding. Important are two options for tex4ht command, -c and -utf8.

ToDo: add description of process of conversion from htf fonts to utf8 using unicode.4hf. It is directed from tex4ht.env file.

With make4ht, situation is easier, as all we need to do is to add -u option:

make4ht -u sample1.tex

resulting file:

<!--l. 6--><p class="noindent" >Příliš žluťoučký kůň úpěl <span 
class="ecti-1000">ď</span><span 
class="ecti-1000">ábelsk</span><span 
class="ecti-1000">é </span>ódy. Some text in English 
</p>

Entities are gone, but other persists. What we see is caused by a bug in tex4ht command. It decorates text which is set in non-default font with <span> elements. Unfortunately it doesn’t play well with accented letters as we can see. This has easy solution, fortunately. We just need to dive into TeX4ht configuration. Yay!

2.7 Configurations

We already saw that we can use command line options to configure the output. For full list of options for tex4ht.sty, see an article on CVR’s blog. These options mainly influence appearance or math, footnotes, tables, etc. Note that these options aren’t fixed set, anyone can add new options and not all options are supported in each output format supported by tex4ht. Generally these options work with html (and xhtml) output.

Other option is to use custom config file (.cfg). This is a TeX file with some basic structure:

 optional stuff like requiring LaTeX packages etc 
 ... 
 \Preamble{xhtml,tex4ht.sty options} 
 ... 
 TeX4ht configurations 
 ... 
 \begin{document} 
 ... 
 more TeX4ht configurations 
 ... 
 \EndPreamble

Most important command for configuring is \Configure. This command has variable number of arguments, in the simplest form it does have two arguments: \Configure{configname}{insert for a first hook}.

At this place we should talk about hooks. In order to insert html tags, LaTeX macros are redefined and in the definitions special hooks are inserted. These hooks are declared with \NewConfigure{configname}{number of hooks} in special file named as redefined package name with suffix .4ht. These hooks are then seeded in configure files for particular output formats, or in the .cfg file.

To illustrate that, we can show some simple example. Lets say we have simple package hello.sty:

\ProvidesPackage{hello} 
\newcommand\hello{\textbf{hello world}} 
\endinput

we can provide hooks in file named hello.4ht. Say we just want to insert tags at beginning and at end of \hello command:

% provide configure for \hello command. we can choose any name 
% but most convenient is to name hooks after redefined command 
% we declare two hooks, to be inserted before and after the command 
\NewConfigure{hello}{2} 
% now we need to redefine \hello. save it to tmp command 
\let\tmp:hello\hello 
% note that `:` can be part of command name in `.4ht` files. 
% now insert the hooks. they are named as \a:hook, \b:hook, ..., \h:hook 
% depending on how many hooks were declared 
\renewcommand\hello{\a:hello\tmp:hello\b:hello}

because we want to surround contents produced by \hello with tags, we need to declare two hooks. This is the most usual case for normal commands which just produce some text. Old contents of macro are saved in temporary macro and then command is redefined to insert hooks and original contents stored in temporary macro.

Now we can change our sample to use hello package:

\documentclass{article} 
\usepackage[english,czech]{babel} 
\usepackage[T1]{fontenc} 
\usepackage[utf8]{inputenc} 
\usepackage{hello} 
\begin{document} Příliš žluťoučký kůň úpěl \textit{ďábelské} ódy. 
\begin{otherlanguage}{english} Some text in English, \hello 
\end{otherlanguage} 
\end{document}

we haven’t provided any configurations for hello yet, but you can see that text hello world is in bold font anyway. This is the same case as \textit which is converted as italic. Basic font styles are inserted by tex4ht command during extraction of text from dvi to a output format. So it is the right time to finally show how to configure both textit and hello to produce some better tags than they provide by default.

Basic structure of a config file has been shown before, so now we will just add basic configurations for \textit and \hello:

\Preamble{xhtml} 
\Configure{textit}{\HCode{<span class="textit">}}{\HCode{</span>}} 
\Configure{hello}{\HCode{<span class="hello">}}{\HCode{</span>}} 
\Css{.textit{font-style:italic;}} 
\Css{.hello{font-weight:bold;}} 
\begin{document} 
\EndPreamble

For documentation of default configurations, see TeX4ht info, most useful are LaTeX and TeX4ht sections. Documentation for basic font commands such as \textit or \textbf is provided in LaTeX section. We can see that configuration takes two parameters, insertion before and after content. Same situation is with hello configuration we defined earlier, hooks are inserted before and after the content.

To insert html tags, we need to use \HCode commands, special characters such as <,> or & are escaped otherwise. In our example we insert span elements with some class attribute to distinguish them. Because these classes doesn’t have any visual appearance by default, we use \Css commands to add some styling. Yes, you need to know both html and css to effectively configure TeX4ht!

If we look at html output now, we can see that things don’t look much better than initially:

<!--l. 6--><p class="noindent" >Příliš žluťoučký kůň úpěl <span class="textit"><span 
class="ecti-1000">ď</span><span 
class="ecti-1000">ábelsk</span><span 
class="ecti-1000">é</span></span> ódy. Some text in English, <span class="hello"><span 
class="ecbx-1000">hello world</span></span> 
</p>

our new tags were inserted, but unnecessary elements inserted by tex4ht processor are still present. Fortunately, we can suppress insertion of these elements with \NoFonts command, and later enable again with \EndNoFonts. We can also use tex4ht.sty option NoFonts, which will suppress font processing in whole document, but you should use this with caution, as it may have some side effects.

Let’s take a look how would out configurations look with \NoFonts command:

\Preamble{xhtml} 
\Configure{textit}{\HCode{<span class="textit">}\NoFonts} 
{\EndNoFonts\HCode{</span>}} 
\Configure{hello}{\HCode{<span class="hello">}\NoFonts} 
{\EndNoFonts\HCode{</span>}} 
\Css{.textit{font-style:italic;}} 
\Css{.hello{font-weight:bold;}} 
\begin{document} 
\EndPreamble

the output now looks much better:

<!--l. 6--><p class="noindent" >Příliš žluťoučký kůň úpěl <span class="textit">ďábelské</span> ódy. Some text in English, <span class="hello">hello world</span> 
</p>

It may seems that we can be happy at this point, but things aren’t as easy as we may hope, because we haven’t talked about one thing:

2.8 Paragraphs

What if we add some more paragraphs in English to our sample file?

\documentclass{article} 
\usepackage[english,czech]{babel} 
\usepackage[T1]{fontenc} 
\usepackage[utf8]{inputenc} 
\usepackage{hello} 
\begin{document} Příliš žluťoučký kůň úpěl \textit{ďábelské} ódy. 
\begin{otherlanguage}{english} Some text in English, \hello 
\end{otherlanguage} 
 
\begin{otherlanguage}{english} 
 
\textit{What will do} \verb|\textit| at the beginning of paragraph? 
 
And also, what about configuration for \verb|otherlanguage| environment? 
 
\end{otherlanguage} 
 
\end{document}

What if we want to insert elements with lang attribute to specify language of text in the html. It might be useful from semantic point of view, we can also enable hyphenation in the css and it works only when correct languages are marked in the source.

This exercise will be little bit more difficult