Chapter 2
Basic Tutorial
2.1 What is TeX4ht?
TeX4ht
is a system that converts LaTeX to various output formats, including
HTML
, ODT
, DocBook
or TEI
. HTML
and ODT
formats are the most common and
best-supported conversion targets.
TeX4ht
allows authors to convert LaTeX input to several output formats, like
HTML
(for web pages) or ePub
(for ebooks and other applications).
2.2 Basic Usage
Conversion is invoked using the make4ht
command:
$ make4ht filename.tex
Let us start with conversion of a simple LaTeX file to HTML with the following LaTeX file:
\documentclass{article} \usepackage[czech]{babel} \begin{document} Píli luouký k úpl ábelské ódy. \end{document}
You can compile it using the following command:
$ make4ht -lm draft filename.tex
The resulting HTML file contains the following code:
<!-- l. 4 --><p class='noindent'>Píli luouký k úpl ábelské ódy. </p>
As you can see, multiple options can be joined for make4ht
. The above invocation
is equal to the following:
$ make4ht -l -m draft filename.tex
You can also use the long options:
$ make4ht --lua --mode draft filename.tex
What do these options mean?
The option --lua
tells make4ht
to use LuaLaTeX as a compilation engine. There
is also an option -x
(or --xetex
) that allows the use of XeLaTeX for the
compilation. If neither of these options is used, the file will be converted with the
default PDFLaTeX engine.
For example, if we compile the sample file without the -l
option, we would get a
different result:
<!-- l. 4 --><p class='noindent'>Píli lutouký k úpl dábelské ódy. </p>
Notice that the accents in the and letters are detached from the base letters.
That is because TeX4ht
uses information about characters in the DVI file. Current
LaTeX supports basic accented characters out of the box, but sometimes, they don’t
work as expected.
The option --mode
sets the compilation mode. make4ht
has one built-in mode,
named draft
. By default, make4ht
compiles your TeX file three times, to obtain the
correct hyperlinks and other features that depend on auxilary files. The draft
mode
uses only one compilation run, so it is much faster.
make4ht
converts LaTeX file to a HTML 5 document. You can request conversion
to other formats using the -f
option. For example, to convert a document to the
OpenDocument Format, use the following:
$ make4ht -f odt filename.tex
2.2.1 Debugging
More information about make4ht
and its command line options and other features can
be found in section make4ht
Build System.
2.3 TeX4ht
Options
The simplest way to change some aspects of the design is to use TeX4ht
options. They can be passed as a first positional argument after filename to
make4ht
:
$ make4ht filename.tex "option1,option2"
For example, TeX4ht
produces one HTML file for a document, but each footnote
is placed in a separate file. If you have a large document, you may want to use a
separate page for each chapter, with a list of footnotes at the end of these chapters.
You can use the following options:
$ make4ht filename.tex "3,sec-filename,fn-in"
There are other numeric options, each of them breaks document into separate
HTML pages on a different sectioning level. Option 1 does not break pages at all, 2
at parts, 3 at chapters, 4 at sections, 5 at subsections, 6 at sub-subsections, and 7 at
paragraphs. The sec-filename
option will produce HTML file names that are based
on section titles, instead of their numbers. The fn-in
option prints footnotes at the
end of each HTML page.
There are also options that change the handling of math. Normally, HTML elements are used for simple math, and pictures are used for more complex features, such as fractions or square roots. This usually does not look good, so what are other options?
Generally, it is best to use MathML
, as it supports correct vertical alignment for
inline math, and the font size matches the surrounding text. Unfortunately, some web
browsers do not support it yet. We can use MathJax to render math in these
browsers.
$ make4ht filename.tex "mathml,mathjax"
On the other hand, if you want to use pictures for math exclusively, you can try the pic-m option, which requires pictures even for inline math. There are also similar options for equations and other math environments.
$ make4ht filename.tex "pic-m,pic-equation"
The generated pictures are in the PNG format, which is raster and depends on the resolution on the device where the document is displayed. You may want to use vector SVG format instead, as it should produce better quality of pictures:
$ make4ht filename.tex "pic-m,pic-equation,svg"
For more information on options, see chapter TeX4ht
Options.
2.4 make4ht
extensions
make4ht
has an extension support. These extensions can modify various aspects of
the conversion process, for example, post-process the generated files, cache images,
or add support for Rmarkdown files. Extensions can be enabled using the
-f format_name+extension_name
option.
For example, there is a preprocess_input
extension, which adds support
for Markdown or Rtex documents. It can process a following Rmarkdown
document:
This is *Rmarkdown* example. Today is `r Sys.Date() `.
Compile it using the following command:
$ make4ht -f html5+preprocess_input sample.Rmd
It producess a following HTML file:
<!-- l. 66 --><p class='noindent'>This is <span class='ec-lmri-10'>Rmarkdown </span>example. Today is 2022-03-24. </p>
If your document produces many pictures, the compilation can take a long time.
To make it faster, you can use the dvisvgm_hashes
extension. It caches the SVG
images and creates them only for the changed math environments.
$ make4ht -f html5+dvisvgm_hashes filename.tex "pic-m,pic-equation,svg"
make4ht
loads the common_domfilters
extension automatically. It fixes common
issues in the generated HTML files using the LuaXML package. To disable extension
from loading, use -extension_name
syntax:
$ make4ht -f html5-common_domfilters filename.tex
You can find a list of extensions in make4ht
documentation.
2.5 Configurations
Most of the markup produced by TeX4ht
is configurable. Supported commands can
be configured using the \Configure
command. We can also insert markup before and
after environments, using \ConfigureEnv
command.
While it is possible to insert these commands directly to your document, it is
better to use a custom configuration file, as you would get a compilation
error if you compiled document containing TeX4ht
commands directly by
LaTeX.
You can find more information about syntax and available commands in section Private Configuration Files. Here, we will show some simple examples.
2.5.1 The ∖Configure
command
2.5.2 Configuring Environments
You may want to insert some custom HTML tags. It is a bit more complicated for
LaTeXcommands, but it is easy for environments. You can configure the code that is
inserted before and after environment using the \ConfigureEnv
command. It has a
following syntax:
\ConfigureEnv{<environment name>}{before env}{after env} {before-list}{after-list}
We can ignore the arguments before-list
and after-list
, as they
are used only for list like environments, such as itemize
. So we just need
to to pass code that will be inserted in the before env
and after env
arguments.
2.6 Remains of the old tutorial
The following text was imported from the original TeX4ht
tutorial and needs to be
rewritten. It still contains some useful information, but there are also some obsolete
pieces.
But beware of the following situation:
Hello world. \begin{someenv} Just start some environment. But run it through several paragraphs \end{someenv}
say that we insert <div class="someenv">
and </div>
tags around the someenv
environment. By default this may produce following structure:
<p>Hello world. <div class="someenv">Just start some environment. </p> <p>But run it through several paragraphs </div></p>
as you can see, generated html code is incorrect, as opening and closing <div>
tags have different parent elements. someenv
environment can be configured to close
current paragraph, but it may be not what you want.
Best way to prevent tag mismatch may be something like:
Hello world. \begin{someenv} Just start some environment. \end{someenv} \begin{someenv} But run it through several paragraphs \end{someenv}
and with make4ht
make4ht sample1
lets look on text part generated by htlatex
:
<!--l. 6--><p class="noindent" >Příliš žluťoučký kůň úpěl <span class="ecti-1000">ď</span><span class="ecti-1000">ábelsk</span><span class="ecti-1000">é </span>ódy. Some text in English
and by make4ht
:
<!--l. 6--><p class="noindent" >Příliš žluťoučký kůň úpěl <span class="ecti-1000">ď</span><span class="ecti-1000">ábelsk</span><span class="ecti-1000">é </span>ódy. Some text in English </p>
only difference is missing </p>
tag in output of htlatex
, because html 4.01
is
produced by htlatex
by default. make4ht
on the other hand produces xhtml
by
default, so closing tag must be presented.
To get xhtml
output from htlatex
, use tex4ht.sty
option xhtml
. This option
must be first option in the option list passed to tex4ht.sty
. Value of the first option
must be either html
, xhtml
or name of custom config file. We will cover these
config files later, as they are key component in customization of TeX4ht
output.
So in order to get same output as from make4ht
, we must use following
command:
htlatex sample1 xhtml
Now we should get rid of ugly entities which encode accented letters. This is
somewhat ugly with htlatex
:
htlatex sample1 "xhtml,charset=utf-8" " -cunihtf -utf8"
charset=utf-8
produces meta element which declares document to be in
utf-8
encoding. Important are two options for tex4ht
command, -c
and
-utf8
.
ToDo: add description of process of conversion from htf
fonts to utf8 using
unicode.4hf. It is directed from tex4ht.env
file.
With make4ht
, situation is easier, as all we need to do is to add -u
option:
make4ht -u sample1.tex
resulting file:
<!--l. 6--><p class="noindent" >Píli luouký k úpl <span class="ecti-1000"></span><span class="ecti-1000">ábelsk</span><span class="ecti-1000">é </span>ódy. Some text in English </p>
Entities are gone, but other persists. What we see is caused by a bug in tex4ht
command. It decorates text which is set in non-default font with <span>
elements.
Unfortunately it doesn’t play well with accented letters as we can see. This has
easy solution, fortunately. We just need to dive into TeX4ht
configuration.
Yay!
2.7 Configurations
We already saw that we can use command line options to configure the output. For
full list of options for tex4ht.sty
, see an article on CVR’s blog. These options
mainly influence appearance or math, footnotes, tables, etc. Note that these options
aren’t fixed set, anyone can add new options and not all options are supported in
each output format supported by tex4ht
. Generally these options work with html
(and xhtml
) output.
Other option is to use custom config file (.cfg
). This is a TeX file with some basic
structure:
optional stuff like requiring LaTeX packages etc ... \Preamble{xhtml,tex4ht.sty options} ... TeX4ht configurations ... \begin{document} ... more TeX4ht configurations ... \EndPreamble
Most important command for configuring is ∖Configure
. This command has
variable number of arguments, in the simplest form it does have two arguments:
∖Configure{configname}{insert for a first hook}
.
At this place we should talk about hooks. In order to insert html tags, LaTeX
macros are redefined and in the definitions special hooks are inserted. These hooks
are declared with ∖NewConfigure{configname}{number of hooks}
in special
file named as redefined package name with suffix .4ht
. These hooks are
then seeded in configure files for particular output formats, or in the .cfg
file.
To illustrate that, we can show some simple example. Lets say we have simple
package hello.sty
:
\ProvidesPackage{hello} \newcommand\hello{\textbf{hello world}} \endinput
we can provide hooks in file named hello.4ht
. Say we just want to insert tags at
beginning and at end of ∖hello
command:
% provide configure for \hello command. we can choose any name % but most convenient is to name hooks after redefined command % we declare two hooks, to be inserted before and after the command \NewConfigure{hello}{2} % now we need to redefine \hello. save it to tmp command \let\tmp:hello\hello % note that `:` can be part of command name in `.4ht` files. % now insert the hooks. they are named as \a:hook, \b:hook, ..., \h:hook % depending on how many hooks were declared \renewcommand\hello{\a:hello\tmp:hello\b:hello}
because we want to surround contents produced by ∖hello
with tags, we need to
declare two hooks. This is the most usual case for normal commands which just
produce some text. Old contents of macro are saved in temporary macro and then
command is redefined to insert hooks and original contents stored in temporary
macro.
Now we can change our sample to use hello
package:
\documentclass{article} \usepackage[english,czech]{babel} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{hello} \begin{document} Píli luouký k úpl \textit{ábelské} ódy. \begin{otherlanguage}{english} Some text in English, \hello \end{otherlanguage} \end{document}
we haven’t provided any configurations for hello
yet, but you can see that text
hello world
is in bold font anyway. This is the same case as ∖textit
which is
converted as italic. Basic font styles are inserted by tex4ht
command during
extraction of text from dvi
to a output format. So it is the right time to finally show
how to configure both textit
and hello
to produce some better tags than they
provide by default.
Basic structure of a config file has been shown before, so now we will just add
basic configurations for ∖textit
and ∖hello
:
\Preamble{xhtml} \Configure{textit}{\HCode{<span class="textit">}}{\HCode{</span>}} \Configure{hello}{\HCode{<span class="hello">}}{\HCode{</span>}} \Css{.textit{font-style:italic;}} \Css{.hello{font-weight:bold;}} \begin{document} \EndPreamble
For documentation of default configurations, see TeX4ht info, most useful are
LaTeX and TeX4ht sections. Documentation for basic font commands such as
∖textit
or ∖textbf
is provided in LaTeX section. We can see that configuration
takes two parameters, insertion before and after content. Same situation is with
hello
configuration we defined earlier, hooks are inserted before and after the
content.
To insert html
tags, we need to use ∖HCode
commands, special characters such as
<
,>
or &
are escaped otherwise. In our example we insert span
elements with
some class
attribute to distinguish them. Because these classes doesn’t
have any visual appearance by default, we use ∖Css
commands to add some
styling. Yes, you need to know both html
and css
to effectively configure
TeX4ht
!
If we look at html
output now, we can see that things don’t look much better
than initially:
<!--l. 6--><p class="noindent" >Píli luouký k úpl <span class="textit"><span class="ecti-1000"></span><span class="ecti-1000">ábelsk</span><span class="ecti-1000">é</span></span> ódy. Some text in English, <span class="hello"><span class="ecbx-1000">hello world</span></span> </p>
our new tags were inserted, but unnecessary elements inserted by tex4ht
processor are still present. Fortunately, we can suppress insertion of these elements
with ∖NoFonts
command, and later enable again with ∖EndNoFonts
. We can also use
tex4ht.sty
option NoFonts
, which will suppress font processing in whole
document, but you should use this with caution, as it may have some side
effects.
Let’s take a look how would out configurations look with ∖NoFonts
command:
\Preamble{xhtml} \Configure{textit}{\HCode{<span class="textit">}\NoFonts} {\EndNoFonts\HCode{</span>}} \Configure{hello}{\HCode{<span class="hello">}\NoFonts} {\EndNoFonts\HCode{</span>}} \Css{.textit{font-style:italic;}} \Css{.hello{font-weight:bold;}} \begin{document} \EndPreamble
the output now looks much better:
<!--l. 6--><p class="noindent" >Píli luouký k úpl <span class="textit">ábelské</span> ódy. Some text in English, <span class="hello">hello world</span> </p>
It may seems that we can be happy at this point, but things aren’t as easy as we may hope, because we haven’t talked about one thing:
2.8 Paragraphs
What if we add some more paragraphs in English to our sample file?
\documentclass{article} \usepackage[english,czech]{babel} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{hello} \begin{document} Píli luouký k úpl \textit{ábelské} ódy. \begin{otherlanguage}{english} Some text in English, \hello \end{otherlanguage} \begin{otherlanguage}{english} \textit{What will do} \verb|\textit| at the beginning of paragraph? And also, what about configuration for \verb|otherlanguage| environment? \end{otherlanguage} \end{document}
What if we want to insert elements with lang
attribute to specify language of
text in the html
. It might be useful from semantic point of view, we can also enable
hyphenation in the css
and it works only when correct languages are marked in the
source.
This exercise will be little bit more difficult