## Chapter 2Basic Tutorial

### 2.1 What is TeX4ht?

TeX4ht is a system that converts LaTeX to various output formats, including HTML, ODT, DocBook or TEI. HTML and ODT formats are the most common and best-supported conversion targets.

TeX4ht allows authors to convert LaTeX input to several output formats, like HTML (for web pages) or ePub (for ebooks and other applications).

### 2.2 Basic Usage

Conversion is invoked using the make4ht command:

Let us start with conversion of a simple LaTeX file to HTML with the following LaTeX file:

You can compile it using the following command:

The resulting HTML file contains the following code:

As you can see, multiple options can be joined for make4ht. The above invocation is equal to the following:

You can also use the long options:

What do these options mean?

The option --lua tells make4ht to use LuaLaTeX as a compilation engine. There is also an option -x (or --xetex) that allows the use of XeLaTeX for the compilation. If neither of these options is used, the file will be converted with the default PDFLaTeX engine.

For example, if we compile the sample file without the -l option, we would get a different result:

Notice that the accents in the and letters are detached from the base letters. That is because TeX4ht uses information about characters in the DVI file. Current LaTeX supports basic accented characters out of the box, but sometimes, they don’t work as expected.

The option --mode sets the compilation mode. make4ht has one built-in mode, named draft. By default, make4ht compiles your TeX file three times, to obtain the correct hyperlinks and other features that depend on auxilary files. The draft mode uses only one compilation run, so it is much faster.

make4ht converts LaTeX file to a HTML 5 document. You can request conversion to other formats using the -f option. For example, to convert a document to the OpenDocument Format, use the following:

#### 2.2.1 Debugging

More information about make4htand its command line options and other features can be found in section make4ht Build System.

### 2.3 TeX4ht Options

The simplest way to change some aspects of the design is to use TeX4ht options. They can be passed as a first positional argument after filename to make4ht:

For example, TeX4ht produces one HTML file for a document, but each footnote is placed in a separate file. If you have a large document, you may want to use a separate page for each chapter, with a list of footnotes at the end of these chapters. You can use the following options:

There are other numeric options, each of them breaks document into separate HTML pages on a different sectioning level. Option 1 does not break pages at all, 2 at parts, 3 at chapters, 4 at sections, 5 at subsections, 6 at sub-subsections, and 7 at paragraphs. The sec-filename option will produce HTML file names that are based on section titles, instead of their numbers. The fn-in option prints footnotes at the end of each HTML page.

There are also options that change the handling of math. Normally, HTML elements are used for simple math, and pictures are used for more complex features, such as fractions or square roots. This usually does not look good, so what are other options?

Generally, it is best to use MathML, as it supports correct vertical alignment for inline math, and the font size matches the surrounding text. Unfortunately, some web browsers do not support it yet. We can use MathJax to render math in these browsers.

On the other hand, if you want to use pictures for math exclusively, you can try the pic-m option, which requires pictures even for inline math. There are also similar options for equations and other math environments.

The generated pictures are in the PNG format, which is raster and depends on the resolution on the device where the document is displayed. You may want to use vector SVG format instead, as it should produce better quality of pictures:

For more information on options, see chapter TeX4ht Options.

### 2.4 make4ht extensions

make4ht has an extension support. These extensions can modify various aspects of the conversion process, for example, post-process the generated files, cache images, or add support for Rmarkdown files. Extensions can be enabled using the -f format_name+extension_name option.

For example, there is a preprocess_input extension, which adds support for Markdown or Rtex documents. It can process a following Rmarkdown document:

Compile it using the following command:

It producess a following HTML file:

If your document produces many pictures, the compilation can take a long time. To make it faster, you can use the dvisvgm_hashes extension. It caches the SVG images and creates them only for the changed math environments.

make4ht loads the common_domfilters extension automatically. It fixes common issues in the generated HTML files using the LuaXML package. To disable extension from loading, use -extension_name syntax:

You can find a list of extensions in make4ht documentation.

### 2.5 Configurations

Most of the markup produced by TeX4ht is configurable. Supported commands can be configured using the \Configure command. We can also insert markup before and after environments, using \ConfigureEnv command.

While it is possible to insert these commands directly to your document, it is better to use a custom configuration file, as you would get a compilation error if you compiled document containing TeX4ht commands directly by LaTeX.

You can find more information about syntax and available commands in section Private Configuration Files. Here, we will show some simple examples.

#### 2.5.2 Configuring Environments

You may want to insert some custom HTML tags. It is a bit more complicated for LaTeXcommands, but it is easy for environments. You can configure the code that is inserted before and after environment using the \ConfigureEnv command. It has a following syntax:

We can ignore the arguments before-list and after-list, as they are used only for list like environments, such as itemize. So we just need to to pass code that will be inserted in the before env and after env arguments.

### 2.6 Remains of the old tutorial

The following text was imported from the original TeX4ht tutorial and needs to be rewritten. It still contains some useful information, but there are also some obsolete pieces.

But beware of the following situation:

say that we insert <div class="someenv"> and </div> tags around the someenv environment. By default this may produce following structure:

as you can see, generated html code is incorrect, as opening and closing <div> tags have different parent elements. someenv environment can be configured to close current paragraph, but it may be not what you want.

Best way to prevent tag mismatch may be something like:

and with make4ht

lets look on text part generated by htlatex:

and by make4ht:

only difference is missing </p> tag in output of htlatex, because html 4.01 is produced by htlatex by default. make4ht on the other hand produces xhtml by default, so closing tag must be presented.

To get xhtml output from htlatex, use tex4ht.sty option xhtml. This option must be first option in the option list passed to tex4ht.sty. Value of the first option must be either html, xhtml or name of custom config file. We will cover these config files later, as they are key component in customization of TeX4ht output.

So in order to get same output as from make4ht, we must use following command:

Now we should get rid of ugly entities which encode accented letters. This is somewhat ugly with htlatex:

charset=utf-8 produces meta element which declares document to be in utf-8 encoding. Important are two options for tex4ht command, -c and -utf8.

ToDo: add description of process of conversion from htf fonts to utf8 using unicode.4hf. It is directed from tex4ht.env file.

With make4ht, situation is easier, as all we need to do is to add -u option:

resulting file:

Entities are gone, but other persists. What we see is caused by a bug in tex4ht command. It decorates text which is set in non-default font with <span> elements. Unfortunately it doesn’t play well with accented letters as we can see. This has easy solution, fortunately. We just need to dive into TeX4ht configuration. Yay!

### 2.7 Configurations

We already saw that we can use command line options to configure the output. For full list of options for tex4ht.sty, see an article on CVR’s blog. These options mainly influence appearance or math, footnotes, tables, etc. Note that these options aren’t fixed set, anyone can add new options and not all options are supported in each output format supported by tex4ht. Generally these options work with html (and xhtml) output.

Other option is to use custom config file (.cfg). This is a TeX file with some basic structure:

Most important command for configuring is ∖Configure. This command has variable number of arguments, in the simplest form it does have two arguments: ∖Configure{configname}{insert for a first hook}.

At this place we should talk about hooks. In order to insert html tags, LaTeX macros are redefined and in the definitions special hooks are inserted. These hooks are declared with ∖NewConfigure{configname}{number of hooks} in special file named as redefined package name with suffix .4ht. These hooks are then seeded in configure files for particular output formats, or in the .cfg file.

To illustrate that, we can show some simple example. Lets say we have simple package hello.sty:

we can provide hooks in file named hello.4ht. Say we just want to insert tags at beginning and at end of ∖hello command:

because we want to surround contents produced by ∖hello with tags, we need to declare two hooks. This is the most usual case for normal commands which just produce some text. Old contents of macro are saved in temporary macro and then command is redefined to insert hooks and original contents stored in temporary macro.

Now we can change our sample to use hello package:

we haven’t provided any configurations for hello yet, but you can see that text hello world is in bold font anyway. This is the same case as ∖textit which is converted as italic. Basic font styles are inserted by tex4ht command during extraction of text from dvi to a output format. So it is the right time to finally show how to configure both textit and hello to produce some better tags than they provide by default.

Basic structure of a config file has been shown before, so now we will just add basic configurations for ∖textit and ∖hello:

For documentation of default configurations, see TeX4ht info, most useful are LaTeX and TeX4ht sections. Documentation for basic font commands such as ∖textit or ∖textbf is provided in LaTeX section. We can see that configuration takes two parameters, insertion before and after content. Same situation is with hello configuration we defined earlier, hooks are inserted before and after the content.

To insert html tags, we need to use ∖HCode commands, special characters such as <,> or & are escaped otherwise. In our example we insert span elements with some class attribute to distinguish them. Because these classes doesn’t have any visual appearance by default, we use ∖Css commands to add some styling. Yes, you need to know both html and css to effectively configure TeX4ht!

If we look at html output now, we can see that things don’t look much better than initially:

our new tags were inserted, but unnecessary elements inserted by tex4ht processor are still present. Fortunately, we can suppress insertion of these elements with ∖NoFonts command, and later enable again with ∖EndNoFonts. We can also use tex4ht.sty option NoFonts, which will suppress font processing in whole document, but you should use this with caution, as it may have some side effects.

Let’s take a look how would out configurations look with ∖NoFonts command:

the output now looks much better:

It may seems that we can be happy at this point, but things aren’t as easy as we may hope, because we haven’t talked about one thing:

### 2.8 Paragraphs

What if we add some more paragraphs in English to our sample file?

What if we want to insert elements with lang attribute to specify language of text in the html. It might be useful from semantic point of view, we can also enable hyphenation in the css and it works only when correct languages are marked in the source.

This exercise will be little bit more difficult