Chapter 2
Basic Tutorial
2.1 What is TeX4ht?
TeX4ht
is a system that converts LaTeX to various output formats, including HTML
,
ODT
, DocBook
, and TEI
. HTML
and ODT
formats are the most common and
best-supported conversion targets.
TeX4ht
allows authors to convert LaTeX input into several output formats, such
as HTML
(for web pages) or ePub
(for ebooks and other applications).
2.2 Basic Usage
Conversion is invoked using the make4ht
command:
$ make4ht filename.tex
Let us start with the conversion of a simple LaTeX file to HTML using the following LaTeX file:
\documentclass{article} \usepackage[czech]{babel} \begin{document} Příliš žluťoučký kůň úpěl ďábelské ódy. \end{document}
You can compile it using the following command:
$ make4ht -lm draft filename.tex
The resulting HTML file contains the following code:
<!-- l. 4 --><p class='noindent'>Příliš žluťoučký kůň úpěl ďábelské ódy. </p>
As you can see, multiple options can be combined with make4ht
. The above
invocation is equivalent to the following:
$ make4ht -l -m draft filename.tex
You can also use the long options:
$ make4ht --lua --mode draft filename.tex
What do these options mean?
The --lua
option tells make4ht
to use LuaLaTeX as the compilation engine.
There is also an option -x
(or --xetex
) that allows the use of XeLaTeX for
compilation. If neither of these options is used, the file will be compiled using the
default PDFLaTeX engine.
The --mode
option sets the compilation mode. make4ht
has one built-in mode,
named draft
. By default, make4ht
compiles your TeX file three times to ensure
correct hyperlinks and other features that depend on auxiliary files. The draft
mode
uses only one compilation run, so it is much faster.
make4ht
converts a LaTeX file to an HTML 5 document. You can request
conversion to other formats using the -f
option. For example, to convert a document
to the OpenDocument Format, use the following:
$ make4ht -f odt filename.tex
2.3 Debugging TeX4ht
When working with TeX4ht
, you may encounter issues with the conversion
process, such as clashes between packages and TeX4ht
, formatting errors
or missing content. Here are some tips and tools to help you debug these
problems.
TeX4ht
hides the output of the commands it uses for compilation. However, if it
encounters an error, it will display it in the terminal output. For example, if it
encounters an unknown command, it will output an error message similar to the
following:
$ test4ht -slm draft grr.tex [STATUS] make4ht: Conversion started [STATUS] make4ht: Input file: grr.tex [ERROR] htlatex: Compilation errors in the htlatex run [ERROR] htlatex: Filename Line Message [ERROR] htlatex: ./grr.tex 15 Undefined control sequence. [STATUS] make4ht: Conversion finished
In this example, it says that there is an undefined control sequence used on the
line 15 in the grr.tex
file. In this case, it’s easy to locate the error, but that’s not
always true. Some errors arise due to conflicts between certain packages and
‘TeX4ht
‘, making it more challenging to identify the root cause.
By default, only errors and warnings are shown. Using the -a debug
option,
LaTeX is run in interactive mode, where you can see all terminal output and also
control the engine by entering commands if the compilation stops due to an error.
This option also displays all internal messages from TeX4ht
, which can be helpful for
debugging.
To invoke make4ht
with debug mode, use the following command:
$ make4ht -a debug filename.tex
2.3.1 Test with Minimal Example
If you are facing a persistent issue, try isolating the problematic section of your
document. Create a minimal LaTeX file that reproduces the problem and use
make4ht
to convert it. This method helps you identify whether the issue lies in the
structure of your document or specific commands. Try removing the used packages
one by one until the error no longer occurs.
To quickly fix an error, you can use the \ifdefined\HCode ... \else ... \fi
condition. For example, there are some packages that cause fatal errors with TeX4ht
,
because they redefine some commands to output some PDF instructions. As such
instructions are not useful in the HTML output anyway, you can safely exclude these
packages with TeX4ht
:
\ifdefined\HCode\else \usepackage{insertpdfinstructions} \fi
You can fill a bug report to TeX4ht
maintainers anyway, because we try to be able
to run all LaTeX source files without such modifications.
Debugging TeX4ht
can sometimes involve trial and error, but with the right tools
and careful analysis, most issues can be resolved efficiently.
You can report errors in the TeX4ht issue tracker, on using the tex4ht tag, or on the make4ht GitHub page.
You can find more info about troubleshooting in Compilation Errors, and about
make4ht
and its command line options and other features in section make4ht
Build
System.
2.4 TeX4ht
Options
The simplest way to change some aspects of the design is to use TeX4ht
options. They can be passed as a first positional argument after filename to
make4ht
:
$ make4ht filename.tex "option1,option2"
For example, TeX4ht
produces one HTML file for a document, but each footnote
is placed in a separate file. If you have a large document, you may want to use a
separate page for each chapter, with a list of footnotes at the end of these chapters.
You can use the following options:
$ make4ht filename.tex "3,sec-filename,fn-in"
There are other numeric options, each of them breaks document into separate
HTML pages on a different sectioning level. Option 1 does not break pages at all, 2
at parts, 3 at chapters, 4 at sections, 5 at subsections, 6 at sub-subsections, and 7 at
paragraphs. The sec-filename
option will produce HTML file names that are based
on section titles, instead of their numbers. The fn-in
option prints footnotes at the
end of each HTML page.
2.4.1 Math Options
There are also options that change the handling of math. Normally, HTML elements are used for simple math, and pictures are used for more complex features, such as fractions or square roots. This usually does not look good, so what are other options?
Generally, it is best to use MathML
, as it supports correct vertical alignment for
inline math, and the font size matches the surrounding text. Unfortunately, some web
browsers do not support it yet. We can use MathJax to render math in these
browsers.
$ make4ht filename.tex "mathml,mathjax"
On the other hand, if you want to use pictures for math exclusively, you can try the pic-m option, which requires pictures even for inline math. There are also similar options for equations and other math environments.
$ make4ht filename.tex "pic-m,pic-equation"
The generated pictures are in the PNG format, which is raster and depends on the resolution on the device where the document is displayed. You may want to use vector SVG format instead, as it should produce better quality of pictures:
$ make4ht filename.tex "pic-m,pic-equation,svg"
For more information on options, see chapter TeX4ht
Options.
2.5 make4ht
extensions
make4ht
has an extension support. These extensions can modify various aspects of
the conversion process, for example, post-process the generated files, cache images,
or add support for Rmarkdown files. Extensions can be enabled using the
-f format_name+extension_name
option.
For example, there is a preprocess_input
extension, which adds support
for Markdown or Rtex documents. It can process a following Rmarkdown
document:
This is *Rmarkdown* example. Today is `r Sys.Date() `.
Compile it using the following command:
$ make4ht -f html5+preprocess_input sample.Rmd
It producess a following HTML file:
<!-- l. 66 --><p class='noindent'>This is <span class='ec-lmri-10'>Rmarkdown </span>example. Today is 2022-03-24. </p>
If your document produces many pictures, the compilation can take a long time.
To make it faster, you can use the dvisvgm_hashes
extension. It caches the SVG
images and creates them only for the changed math environments.
$ make4ht -f html5+dvisvgm_hashes filename.tex "pic-m,pic-equation,svg"
make4ht
loads the common_domfilters
extension automatically. It fixes common
issues in the generated HTML files using the LuaXML package. To disable extension
from loading, use -extension_name
syntax:
$ make4ht -f html5-common_domfilters filename.tex
You can find a list of extensions in make4ht
documentation.
2.6 Configurations
Most of the markup produced by TeX4ht
is configurable. Supported commands can
be configured using the \Configure
command. We can also insert markup before and
after environments, using \ConfigureEnv
command.
While it is possible to insert these commands directly to your document, it is
better to use a custom configuration file, as you would get a compilation
error if you compiled document containing TeX4ht
commands directly by
LaTeX.
You can find more information about syntax and available commands in section Private Configuration Files. Here, we will show some simple examples.
2.6.1 The \Configure
command
2.6.2 Configuring Environments
You may want to insert some custom HTML tags. It is a bit more complicated for
LaTeXcommands, but it is easy for environments. You can configure the code that is
inserted before and after environment using the \ConfigureEnv
command. It has a
following syntax:
\ConfigureEnv{<environment name>}{before env}{after env} {before-list}{after-list}
We can ignore the arguments before-list
and after-list
, as they
are used only for list like environments, such as itemize
. So we just need
to to pass code that will be inserted in the before env
and after env
arguments.
2.7 Remains of the old tutorial
The following text was imported from the original TeX4ht
tutorial and needs to be
rewritten. It still contains some useful information, but there are also some obsolete
pieces.
But beware of the following situation:
Hello world. \begin{someenv} Just start some environment. But run it through several paragraphs \end{someenv}
say that we insert <div class="someenv">
and </div>
tags around the someenv
environment. By default this may produce following structure:
<p>Hello world. <div class="someenv">Just start some environment. </p> <p>But run it through several paragraphs </div></p>
as you can see, generated html code is incorrect, as opening and closing <div>
tags have different parent elements. someenv
environment can be configured to close
current paragraph, but it may be not what you want.
Best way to prevent tag mismatch may be something like:
Hello world. \begin{someenv} Just start some environment. \end{someenv} \begin{someenv} But run it through several paragraphs \end{someenv}
and with make4ht
make4ht sample1
lets look on text part generated by htlatex
:
<!--l. 6--><p class="noindent" >Příliš žluťoučký kůň úpěl <span class="ecti-1000">ď</span><span class="ecti-1000">ábelsk</span><span class="ecti-1000">é </span>ódy. Some text in English
and by make4ht
:
<!--l. 6--><p class="noindent" >Příliš žluťoučký kůň úpěl <span class="ecti-1000">ď</span><span class="ecti-1000">ábelsk</span><span class="ecti-1000">é </span>ódy. Some text in English </p>
only difference is missing </p>
tag in output of htlatex
, because html 4.01
is
produced by htlatex
by default. make4ht
on the other hand produces xhtml
by
default, so closing tag must be presented.
To get xhtml
output from htlatex
, use tex4ht.sty
option xhtml
. This option
must be first option in the option list passed to tex4ht.sty
. Value of the first option
must be either html
, xhtml
or name of custom config file. We will cover these
config files later, as they are key component in customization of TeX4ht
output.
So in order to get same output as from make4ht
, we must use following
command:
htlatex sample1 xhtml
Now we should get rid of ugly entities which encode accented letters. This is
somewhat ugly with htlatex
:
htlatex sample1 "xhtml,charset=utf-8" " -cunihtf -utf8"
charset=utf-8
produces meta element which declares document to be in
utf-8
encoding. Important are two options for tex4ht
command, -c
and
-utf8
.
ToDo: add description of process of conversion from htf
fonts to utf8 using
unicode.4hf. It is directed from tex4ht.env
file.
With make4ht
, situation is easier, as all we need to do is to add -u
option:
make4ht -u sample1.tex
resulting file:
<!--l. 6--><p class="noindent" >Příliš žluťoučký kůň úpěl <span class="ecti-1000">ď</span><span class="ecti-1000">ábelsk</span><span class="ecti-1000">é </span>ódy. Some text in English </p>
Entities are gone, but other persists. What we see is caused by a bug in tex4ht
command. It decorates text which is set in non-default font with <span>
elements.
Unfortunately it doesn’t play well with accented letters as we can see. This has
easy solution, fortunately. We just need to dive into TeX4ht
configuration.
Yay!
2.8 Configurations
We already saw that we can use command line options to configure the output. For
full list of options for tex4ht.sty
, see an article on CVR’s blog. These options
mainly influence appearance or math, footnotes, tables, etc. Note that these options
aren’t fixed set, anyone can add new options and not all options are supported in
each output format supported by tex4ht
. Generally these options work with html
(and xhtml
) output.
Other option is to use custom config file (.cfg
). This is a TeX file with some basic
structure:
optional stuff like requiring LaTeX packages etc ... \Preamble{xhtml,tex4ht.sty options} ... TeX4ht configurations ... \begin{document} ... more TeX4ht configurations ... \EndPreamble
Most important command for configuring is \Configure
. This command has
variable number of arguments, in the simplest form it does have two arguments:
\Configure{configname}{insert for a first hook}
.
At this place we should talk about hooks. In order to insert html tags, LaTeX
macros are redefined and in the definitions special hooks are inserted. These hooks
are declared with \NewConfigure{configname}{number of hooks}
in special
file named as redefined package name with suffix .4ht
. These hooks are
then seeded in configure files for particular output formats, or in the .cfg
file.
To illustrate that, we can show some simple example. Lets say we have simple
package hello.sty
:
\ProvidesPackage{hello} \newcommand\hello{\textbf{hello world}} \endinput
we can provide hooks in file named hello.4ht
. Say we just want to insert tags at
beginning and at end of \hello
command:
% provide configure for \hello command. we can choose any name % but most convenient is to name hooks after redefined command % we declare two hooks, to be inserted before and after the command \NewConfigure{hello}{2} % now we need to redefine \hello. save it to tmp command \let\tmp:hello\hello % note that `:` can be part of command name in `.4ht` files. % now insert the hooks. they are named as \a:hook, \b:hook, ..., \h:hook % depending on how many hooks were declared \renewcommand\hello{\a:hello\tmp:hello\b:hello}
because we want to surround contents produced by \hello
with tags, we need to
declare two hooks. This is the most usual case for normal commands which just
produce some text. Old contents of macro are saved in temporary macro and then
command is redefined to insert hooks and original contents stored in temporary
macro.
Now we can change our sample to use hello
package:
\documentclass{article} \usepackage[english,czech]{babel} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{hello} \begin{document} Příliš žluťoučký kůň úpěl \textit{ďábelské} ódy. \begin{otherlanguage}{english} Some text in English, \hello \end{otherlanguage} \end{document}
we haven’t provided any configurations for hello
yet, but you can see that text
hello world
is in bold font anyway. This is the same case as \textit
which is
converted as italic. Basic font styles are inserted by tex4ht
command during
extraction of text from dvi
to a output format. So it is the right time to finally show
how to configure both textit
and hello
to produce some better tags than they
provide by default.
Basic structure of a config file has been shown before, so now we will just add
basic configurations for \textit
and \hello
:
\Preamble{xhtml} \Configure{textit}{\HCode{<span class="textit">}}{\HCode{</span>}} \Configure{hello}{\HCode{<span class="hello">}}{\HCode{</span>}} \Css{.textit{font-style:italic;}} \Css{.hello{font-weight:bold;}} \begin{document} \EndPreamble
For documentation of default configurations, see TeX4ht info, most useful are
LaTeX and TeX4ht sections. Documentation for basic font commands such as
\textit
or \textbf
is provided in LaTeX section. We can see that configuration
takes two parameters, insertion before and after content. Same situation is with
hello
configuration we defined earlier, hooks are inserted before and after the
content.
To insert html
tags, we need to use \HCode
commands, special characters such as
<
,>
or &
are escaped otherwise. In our example we insert span
elements with
some class
attribute to distinguish them. Because these classes doesn’t
have any visual appearance by default, we use \Css
commands to add some
styling. Yes, you need to know both html
and css
to effectively configure
TeX4ht
!
If we look at html
output now, we can see that things don’t look much better
than initially:
<!--l. 6--><p class="noindent" >Příliš žluťoučký kůň úpěl <span class="textit"><span class="ecti-1000">ď</span><span class="ecti-1000">ábelsk</span><span class="ecti-1000">é</span></span> ódy. Some text in English, <span class="hello"><span class="ecbx-1000">hello world</span></span> </p>
our new tags were inserted, but unnecessary elements inserted by tex4ht
processor are still present. Fortunately, we can suppress insertion of these elements
with \NoFonts
command, and later enable again with \EndNoFonts
. We can also use
tex4ht.sty
option NoFonts
, which will suppress font processing in whole
document, but you should use this with caution, as it may have some side
effects.
Let’s take a look how would out configurations look with \NoFonts
command:
\Preamble{xhtml} \Configure{textit}{\HCode{<span class="textit">}\NoFonts} {\EndNoFonts\HCode{</span>}} \Configure{hello}{\HCode{<span class="hello">}\NoFonts} {\EndNoFonts\HCode{</span>}} \Css{.textit{font-style:italic;}} \Css{.hello{font-weight:bold;}} \begin{document} \EndPreamble
the output now looks much better:
<!--l. 6--><p class="noindent" >Příliš žluťoučký kůň úpěl <span class="textit">ďábelské</span> ódy. Some text in English, <span class="hello">hello world</span> </p>
It may seems that we can be happy at this point, but things aren’t as easy as we may hope, because we haven’t talked about one thing:
2.9 Paragraphs
What if we add some more paragraphs in English to our sample file?
\documentclass{article} \usepackage[english,czech]{babel} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{hello} \begin{document} Příliš žluťoučký kůň úpěl \textit{ďábelské} ódy. \begin{otherlanguage}{english} Some text in English, \hello \end{otherlanguage} \begin{otherlanguage}{english} \textit{What will do} \verb|\textit| at the beginning of paragraph? And also, what about configuration for \verb|otherlanguage| environment? \end{otherlanguage} \end{document}
What if we want to insert elements with lang
attribute to specify language of
text in the html
. It might be useful from semantic point of view, we can also enable
hyphenation in the css
and it works only when correct languages are marked in the
source.
This exercise will be little bit more difficult