Chapter 10
For Developers

This chapter deals with TeX4ht development. It starts with a basic tutorial for a new package support, shows commands useful in the process, different types of TeX4ht configuration files, and the syntax and structure of literate source files.

10.1 Tutorial: Basic Support For a New Package

In this tutorial, we will try to show how to provide TeX4ht support for a simple LaTeX package.

TeX4ht tries to load a special .4ht file for each package loaded by LaTeX. This special file can contain modifications to commands provided by the package, like redefinitions of macros that cause clashes between the package and TeX4ht, and most importantly they insert special macros, called hooks, that are then used to include the output format tags.

Let’s say that you have a custom package, called mynote.sty

\newcommand\notetitle{Note:~} 
\newcommand\note[1]{\textbf{\notetitle}#1} 
\newcommand\highlight[1]{\textbf{#1}} 
\endinput

It defines two user commands, ∖note and ∖highlight. They can be used in the following way:

\documentclass{article} 
\usepackage{mynote} 
\begin{document} 
\note{This is a note} 
 
Try to highlight \highlight{something}. 
\end{document}

TeX4ht produces usable output for both of these commands out of the box, thanks to the support for TeX fonts. But you may want to use custom HTML tags instead. To achieve that, you need to insert special commands, called hooks in TeX4ht, to package commands. These hooks can be then configured to insert tags in the output format.

To introduce hooks, you need to create a hook seeding configuration file for the package, called <name>.4ht. For example, to seed hooks for the mynote.sty package, create file mynote.4ht:

\NewConfigure{note}{3} 
 
% Use \HLet when you want to completely redefine a command 
\def\:tempa#1{\a:note\notetitle\b:note~#1\c:note} 
\HLet\note\:tempa 
 
\NewConfigure{highlight}{2} 
\pend:defI\highlight{\a:highlight} 
\append:defI\highlight{\b:highlight} 
 
\Hinput{mynote} 
\endinput

There is several things to note. First is that the : character can be included as a part of a command name in .4ht files. It is similar to use of the @ character in LaTeX packages. It allows us to create command names that don’t clash with other command names.

The hooks are created using the ∖NewConfigure command. They can be later filled with the ∖Configure command. To have an effect, hooks must be inserted to the existing commands. There are two ways how to do that. For simpler commands, where we want to insert tags only before and after the contents produced by the patched command, we can use the ∖pend:def<X> and ∖append:def<X> commands, where the <X> is a roman number of parameters that the patched command expects. In this example, it expects one parameter, so we can use the ∖pend:defI command. For commands without parameters, use ∖pend:def.

Of course, you can also insert hooks using other mechanisms, for example using LaTeX’s hook system:

\AddToHook{cmd/highlight/before}{\a:highlight} 
\AddToHook{cmd/highlight/after}{\b:highlight}

The second way for hook insertion, useful for commands where we want to insert tags also inside it’s contents, is to use the ∖HLet command. It is a variant of the ∖let command. In contrast to ∖let, it saves the original command as ∖o:<command name>:. Commands redefined by ∖HLet also support the ∖Picture command, where the original version of the command will be used. This way, pictures will produce the same result as they would produce in the PDF mode.

In our example, we redefined the ∖note command to use a hook between note title and note text. This enables us to style both the title and the text differently.

The configuration file for our hooks could look like this:

\Preamble{xhtml} 
\Configure{note} 
{\ifvmode\IgnorePar\fi\EndP\HCode{<div class="note"><span class="notetitle">}} 
{\HCode{</span><span class="notebody">}} 
{\HCode{</span></div>}} 
\Css{.notetitle{font-weight: bold;}} 
 
\Configure{highlight}{\HCode{<span class="highlight">}\NoFonts}{\EndNoFonts\HCode{</span>}} 
\Css{.highlight{font-weight:bold;}} 
\begin{document} 
\EndPreamble

As the ∖note command should be used on it’s own paragraph, we need to fix paragraph closing. See the Paragraph Handling section for more information about this issue. More details about configuration files and configurations are in section Private Configuration Files.

The HTML code produced by our configuration looks like this:

<div class='note'><span class='notetitle'>Note: </span><span class='notebody'> This is a note</span></div> 
<!--  l. 6  --><p class='indent'>    Try to highlight <span class='highlight'>something</span>. 
</p>

10.2 Commands Usable in the .4ht files

∖NewConfigure{name}{number of defined hooks}

This command defines macros with an alphabetic prefix in the form of ∖a:name∖i:name, depending on the number of defined hooks. The maximum number is 9.

\NewConfigure{try}{2} 
\def\try#1{\a:try#1\b:try} 
\Configure{try}{* }{} 
\try{ho} 
% produces "* ho"

∖NewConfigure{name}[number or parameters]{code}

Variant of ∖NewConfigure that doesn’t define hooks with alphabetic prefixes, but it passes argumens of ∖Configure as TeX arguments. See this exampe:

\NewConfigure{try}[2]{\def\hookI{#1}\def\hookII{#2}} 
\def\try#1{\hookI#1\hookII} 
\Configure{try}{* }{} 
\try{ho} 
% produces "* ho"

When you use \Configure{try}, it defines ∖hookI and ∖hookII commands. They can be then used in the redefined ∖try command.

∖HLet{Redefined command name}{new command}

Variant of ∖let that saves the original command under ∖fi:<name>: name. It can detect use of the redefined command inside picture. In such case, it will use the original command to produce correct visual result in the picture.

\NewConfigure{note}{3} 
\def\:tempa#1{\a:note note:\b:note~#1\c:note} 
\HLet\note\:tempa 
\Configure{note}{*}{*}{*} 
\note{hello} 
% produces: "* note:* hello*

∖HRestore{command name}

Restore command redefined using ∖HLet to it’s original content.

∖pend:def<X>{redefined command}{code to be inserted at the begin}

∖append:def<X>{redefined command}{code to be inserted at the end}

These two commands inserts code before and after a redefined command. There are several versions of these commands, depending on the number of parameters that the redefined command expects. Number of parameters as roman number replaces the <X> placeholder.

Up to three parameters are supported.

\newcommand\bar{xxx} 
\pend:def\bar{*} 
\append:def\bar{*} 
\bar 
% produces: "*xxx*" 
\newcommand\foo[2]{#1, #2} 
\pend:defII\foo{*} 
\append:defII\foo{*} 
\foo{a}{b} 
% produces "*a, b*" 

∖:CheckOption{option name}

∖if:Option

Support for custom options. The ∖:CheckOption checks if the given option is active, and ∖if:Option conditional then run true or false branch.

\:CheckOption{info}\if:Option 
... \else ... 
\fi

10.3 Two types of .4ht files

The compilation starts by opening tex4ht.sty and loading a fraction of its code. The main purpose of this phase is to request the loading of the system at a later time (for instance, upon reaching \begin{document}). The motivation for the late loading is to allow TeX4ht to collect as much information as possible about the environment requested by the source file, and help the system reshape that environment with minimal interference from elsewhere.

The system uses two kinds of (4ht) configuration files. The files of the first kind mainly seed hooks into the macros loaded by the source file (for instance, latex.4ht, fontmath.4ht, and article.4ht). The files of the second kind mainly attach meaning to the hooks (for instance, html4.4ht, unicode.4ht, and mathml.4ht).

Different source files may request the loading of different style files and in different orders. The hook seeding files are loaded in response to the loading of the style files, and in a compatible order. Since the different style files may redefine the syntax and semantics of macros, TeX4ht follows a similar route of defining and redefining the hooks and their meanings.

10.3.1 Custom output formats

The meaning attaching files are normally requested through option names introduced in the tex4ht.4ht system file. It defines options for all output formats supported by TeX4ht. For instance, html5, ooffice for the ODT output, tei, and so on.

These options are passed to TeX4ht by make4htaccording to the --format command line parameter, but you can pass them also yourself.

The user may add option names, and redefine old ones, within a new file named tex4ht.usr.

A new tex4ht.usr file should group references to *.4ht configuration files under arbitrarily chosen option names. For that purpose, ∖Configure commands similar to those provided in tex4ht.4ht should be employed. These are particularly useful if you use custom packages that are not included in TeX distributions and thus aren’t supported by TeX4ht.

You can place your custom .4ht files or tex4ht.usr in your local TEXMFHOME tree, for instance in ~/texmf/tex/latex/my4htfiles.

Location of the TEXMFHOME directory can be found using the following command:

$ kpsewhich -var-value TEXMFHOME

Example

Let’s say that you have a custom package mypackage.sty:

\newcommand\mycommand[1]{Hello #1} 
\endinput

This can be configured using the following configuration file, mypackage.4ht:

\NewConfigure{mycommand}{2} 
\pend:defI\mycommand{\a:mycommand} 
\append:defI\mycommand{\b:mycommand} 
\Hinput{mypackage} 
\endinput

Important command in this listing is \Hinput{mypackage}. The ∖Hinput expects package name as it’s argument. It registers it for the latter processing in the output format files.

Here is a custom output format file sample.4ht:

\exit:ifnot{mypackage} 
 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\ConfigureHinput{mypackage} 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\Configure{mycommand}{\HCode{<span class="mycommand">}}{\HCode{</span>}} 
 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
\endinput\empty\empty\empty\empty\empty\empty 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
 
\endinput

The ∖exit:ifnot command takes comma separated list of packages supported by the output format file. This stops it’s loading if the currently processed package doesn’t have configurations in the file.

The configuration for the package is placed between ∖ConfigureHinput and \endinput\empty\empty\empty\empty\empty\empty.

To request the custom output format file, we need to add it to tex4ht.usr. Here is an example that adds a new option myhtml5. It is based on the code for the html5 option from tex4ht.4ht:

\Configure{myhtml5}{% 
   \:CheckOption{info}\if:Option 
               \Hinclude[*]{infoht4.4ht}\fi 
   \:CheckOption{info}\if:Option 
               \Hinclude[*]{infomml.4ht}\fi 
   \Hinclude[*]{html4.4ht}% 
   \Hinclude[*]{unicode.4ht}% 
   \:CheckOption{mathml}\if:Option% 
   \else\:CheckOption{mathml-}\fi% 
   \if:Option% 
      \Hinclude[*]{mathml.4ht}% 
      \Hinclude[*]{html-mml.4ht}% 
   \else 
      \Hinclude[*]{html4-math.4ht}% 
   \fi 
   \:CheckOption{svg}% 
             \if:Option \else\:CheckOption{svg-}\fi 
             \if:Option \else\:CheckOption{svg-obj}\fi 
             \if:Option \else\:CheckOption{svg-inline}\fi 
             \if:Option \Hinclude[*]{svg-option.4ht}% 
                        \:CheckOption{info}\if:Option \Hinclude[*]{infosvg.4ht}\fi 
             \fi 
   \Hinclude[*]{html5.4ht}% 
   \Hinclude[*]{sample.4ht} 
}

It uses the ∖:CheckOption commands to detect additional options, which results in conditional loading of various output format files using the ∖Hinclude command. Our custom output file sample.4ht is placed at the end.

You can then require the custom output format using this command:

$ make4ht filename.tex "myhtml5"

10.4 Early Hooks in usepackage.4ht

Normal .4ht files are loaded once the document preamble was processed. This is usually desirable, as there are packages that redefine other packages commands, and this way can prevent some possible clashes in such cases. Hovewer, sometimes we need to fix package macros as soon as the package is loaded, in other cases, we need to block the package from loading completely. This can be necessary when the package causes fatal error when used.

For these cases, TeX4ht uses a special file, usepackage.4ht, where you can declare code that can be executed before the package is loaded.

As it is loaded multiple times it is best to keep it short and place longer pieces of code to a separate file. Sample code that loads such code looks like this:

\Configure{PackageHooks}{foo.sty}{foo-hooks.4ht}

The <pkgname>-hooks.4ht name is usually used to distinguish this early hooks file from the usual .4ht files. The general structure of the <pkgname>-hooks.4ht file is following:

code to be executed before package loading 
 
\:AtEndOfPackage{ 
code to be executed after package loading 
}

There are two useful commands available:

∖:dontusepackage{package name} – prevent package from loading. It can be used to disable packages that cause fatal error with TeX4ht.

∖:AtEndOfPackage{code to be executed} – execute code after the package was loaded. Useful for redefinition of commands that can be used in the document preamble.

10.4.1 Execute Code Directly in usepackage.4ht

You can also execute shorter pieces of code directly in usepackage.4ht thanks for the new LaTeX package hooks. For example, the following code fixes catcode issues with the ^ character in the doc package:

\AddToHook{package/doc/before}{\SUPOff} 
\AddToHook{package/doc/after}{\SUPOn}

The SUPOff disables catcode changes to this character that TeX4ht uses in order to insert markup for math superscripts, and SUPOn enables it again once the package was processed.

10.5 TeX4ht literate sources

To add a proper support for a new package, it is necessary to edit the TeX4ht literate sources. All distributed TeX4ht files, including tex4ht.sty and all .4ht files, are generated from these literate programming files. It is also the reason why the generated files don’t contain much comments, these are in the sources. If you want to understand how TeX4ht works, it is necessary to read them.

The source files are available in the TeX4ht source repository. You can retrieve them using a SVN client.

$ svn checkout https://svn.gnu.org.ua/sources/tex4ht/ 
$ cd tex4ht/trunk/lit/

The configurable hooks for all packages are contained by the tex4ht-4ht.tex file. Configurations of these hooks is placed in the output format configuration files. The most common output format is HTML, which can be configured in tex4ht-html4.tex, or tex4ht-html5.tex if HTML5 features are used. You can also update sources for other output formats, for example tex4th-ooffice.tex for the ODT format, or tex4ht-tei.tex for TEI. The sources of the tex4ht.sty package are available in tex4ht-sty.tex.

To compile all literate sources, run the make command. You will need basic UNIX utilities for this to succeed, as well as m4 and javac. You can also compile particular source files. Most of them can be compiled using LaTeX, but some of them, for example tex4ht-4ht.tex, needs to be compiled using etex.

10.5.1 How to add support for a package to the TeX4ht literate sources

Given following package sample.sty:

\ProvidesPackage{sample} 
\newcommand\hello{hello} 
\endinput

This simple package defines command \hello, which simply prints the word “hello” when used in a document.

Let’s say that we want to insert some HTML tags before and after the text content printed by the command.

Basic template for tex4ht-4ht.tex:

\<sample.4ht\><<< 
% sample.4ht (|version), generated from |jobname.tex 
% Copyright 2017 TeX Users Group 
|<TeX4ht license text|> 
\NewConfigure{hello}{2} 
\pend:def\hello{\a:hello} 
\append:def\hello{\b:hello} 
\Hinput{sample} 
\endinput 
>>> \AddFile{9}{sample}

Configuration for each package must follow this basic template. The ProTeX system is used as system for literate programming.

The \<name\><<<code>>> block defines new macro which can be then called using |<name|>. The license text is included in this way in the example. The instruction to generate the .4ht file is given in the command \AddFile{9}{sample} after the block definition. The first argument to ∖AddFile is an arbitrary number.

Each package configuration must include \Hinput{packagename}, in order to load the configurations for the package.

The command \NewConfigure{hello}{2} declares new configuration hello, with two configurable hooks. These hooks are named \a:hello and \b:hello. The hooks must be inserted into the \hello, which can be easily done using the \pend:def and \append:def commands. These commands can insert code at the beginning, respective at the end of the redefined command.

The package name must be also included in the mktex4ht-cnf.tex file. This file is used in the generation of the

\AddFile{9}{sample}

You can place configuration for HTML to the tex4ht-html4.tex file:

\<configure html4 sample\><<< 
\Configure{hello}{\HCode{<span class="hello">}}{\HCode{</span>}} 
\Css{.hello{color:red;}} 
>>>

The \<configure html4 packagename\> block will produce code that detects use of the package packagename. It then loads configurations for the package.

The .4ht files can be generated simply using the make command.

The following sample TeX file:

\documentclass{article} 
\usepackage{sample} 
\begin{document} 
  \hello\ world. 
\end{document}

Produces a following HTML code:

<!--l. 4--><p class="noindent" > 
<span class="hello">hello</span>ăworld. 
</p>

10.6 ProTeX

The literate programming system used in the previous section is called ProTeX. We should discuss some main ideas behind this system.

Literate programming is a discipline that promotes the writing of programs the way one explains them to human beings. ProTeX is a literate programming system fully implemented in terms of TeX, and it is compatible with LaTeX and other TeX-base systems. TeX4ht, and ProTeX itself, are examples of applications written in ProTeX.

\input ProTex.sty 
\AlProTex{extension,<<<>>>,list,title,escape-character} 
\<title\><<< 
code fragment 
>>> 
|<title|> 
\OutputCode\<...\>

Some explanation:

\input ProTex.sty 
\AlProTex{extension,<<<>>>,list,title,escape-character}

The escape-character stands for ‘, @, |, or ?. If omitted, it stands for |.

\<title\><<< 
code fragment 
>>> 

This structure provides names to code fragments (the fragments should not be too large in size).

 |<title|>

This command acts as a place holder for the code segment associated to the title (| stands for the escape character).

   \OutputCode\<...\>

This command creates a file for the code whose root node is specified.