Chapter 10
For Developers
This chapter deals with TeX4ht development. It starts with a basic tutorial for a
new package support, shows commands useful in the process, different types of
TeX4ht configuration files, and the syntax and structure of literate source
files.
10.1 Tutorial: Basic Support For a New Package
In this tutorial, we will try to show how to provide TeX4ht support for a simple
LaTeX package.
TeX4ht tries to load a special .4ht file for each package loaded by LaTeX. This
special file can contain modifications to commands provided by the package, like
redefinitions of macros that cause clashes between the package and TeX4ht, and most
importantly they insert special macros, called hooks, that are then used to include
the output format tags.
Let’s say that you have a custom package, called mynote.sty
\newcommand\notetitle{Note:~} \newcommand\note[1]{\textbf{\notetitle}#1} \newcommand\highlight[1]{\textbf{#1}} \endinput
It defines two user commands, \note and \highlight. They can be used in the
following way:
\documentclass{article} \usepackage{mynote} \begin{document} \note{This is a note} Try to highlight \highlight{something}. \end{document}
TeX4ht produces usable output for both of these commands out of the box,
thanks to the support for TeX fonts. But you may want to use custom HTML tags
instead. To achieve that, you need to insert special commands, called hooks in
TeX4ht, to package commands. These hooks can be then configured to insert tags in
the output format.
To introduce hooks, you need to create a hook seeding configuration file for the
package, called <name>.4ht. For example, to seed hooks for the mynote.sty package,
create file mynote.4ht:
\NewConfigure{note}{3} % Use \HLet when you want to completely redefine a command \def\:tempa#1{\a:note\notetitle\b:note~#1\c:note} \HLet\note\:tempa \NewConfigure{highlight}{2} \pend:defI\highlight{\a:highlight} \append:defI\highlight{\b:highlight} \Hinput{mynote} \endinput
There is several things to note. First is that the : character can be included as a
part of a command name in .4ht files. It is similar to use of the @ character in LaTeX
packages. It allows us to create command names that don’t clash with other
command names.
The hooks are created using the \NewConfigure command. They can be later
filled with the \Configure command. To have an effect, hooks must be
inserted to the existing commands. There are two ways how to do that. For
simpler commands, where we want to insert tags only before and after the
contents produced by the patched command, we can use the \pend:def<X> and
\append:def<X> commands, where the <X> is a roman number of parameters that
the patched command expects. In this example, it expects one parameter, so we
can use the \pend:defI command. For commands without parameters, use
\pend:def.
Of course, you can also insert hooks using other mechanisms, for example using LaTeX’s hook system:
\AddToHook{cmd/highlight/before}{\a:highlight} \AddToHook{cmd/highlight/after}{\b:highlight}
The second way for hook insertion, useful for commands where we want to insert
tags also inside it’s contents, is to use the \HLet command. It is a variant of
the \let command. In contrast to \let, it saves the original command as
\o:<command name>:. Commands redefined by \HLet also support the \Picture
command, where the original version of the command will be used. This way,
pictures will produce the same result as they would produce in the PDF
mode.
In our example, we redefined the \note command to use a hook between
note title and note text. This enables us to style both the title and the text
differently.
The configuration file for our hooks could look like this:
\Preamble{xhtml} \Configure{note} {\ifvmode\IgnorePar\fi\EndP\HCode{<div class="note"><span class="notetitle">}} {\HCode{</span><span class="notebody">}} {\HCode{</span></div>}} \Css{.notetitle{font-weight: bold;}} \Configure{highlight}{\HCode{<span class="highlight">}\NoFonts}{\EndNoFonts\HCode{</span>}} \Css{.highlight{font-weight:bold;}} \begin{document} \EndPreamble
As the \note command should be used on it’s own paragraph, we need to fix
paragraph closing. See the Paragraph Handling section for more information about
this issue. More details about configuration files and configurations are in section
Private Configuration Files.
The HTML code produced by our configuration looks like this:
<div class='note'><span class='notetitle'>Note: </span><span class='notebody'> This is a note</span></div> <!-- l. 6 --><p class='indent'> Try to highlight <span class='highlight'>something</span>. </p>
10.2 Tutorial: How to Redefine Package Commands Used in the Document Preamble
The usual .4ht files are loaded only after the \begin{document} command. This
means that they cannot influence macros that are initialized in package options or in
other code executed in the document preamble. For these cases, TeX4ht provides a
special configuration file usepackage.4ht, which is read at the moment
when packages are being loaded. This allows you to insert hooks that block,
replace, or extend package definitions much earlier than would otherwise be
possible.
10.2.1 Blocking a Package from Loading
Sometimes it is necessary to completely prevent a package from being loaded,
because its behavior is incompatible with TeX4ht. The following example shows how
to block the unicode-math package:
% block unicode-math package \:dontusepackage{unicode-math} % provide dummy definition for \setmathfont command \DeclareDocumentCommand \setmathfont { O{} m O{} }{}
This directive ensures that the package is skipped during the loading process.
10.2.2 Executing Code After a Package is Loaded
In other cases we want the package to load, but we need to restore or adjust
definitions it has changed. For instance, the titlesec package redefines all sectioning
commands. If we want to restore the original LaTeX definitions after titlesec is
fully loaded, we can save the old definitions beforehand and then restore them using
\:AtEndOfPackage:
\let\ttl:@makechapterhead\@makechapterhead \let\ttl:@makeschapterhead\@makeschapterhead \let\ttl:chapter\chapter \let\ttl:section\section \let\ttl:subsection\subsection \let\ttl:subsubsection\subsubsection \let\ttl:paragraph\paragraph \let\ttl:subparagraph\subparagraph \:AtEndOfPackage{ \let\chapter\ttl:chapter \let\section\ttl:section \let\subsection\ttl:subsection \let\subsubsection\ttl:subsubsection \let\paragraph\ttl:paragraph \let\subparagraph\ttl:subparagraph \let\@makechapterhead\ttl:@makechapterhead \let\@makeschapterhead\ttl:@makeschapterhead }
10.2.3 Available Commands
Two special commands are provided for these early hooks:
-
\:dontusepackage{package name} – prevents the named package from loading. -
\:AtEndOfPackage{code} – executes the given code after the package has been fully loaded.
These hooks are inserted at the correct place during package processing, so they can safely modify or restore definitions without requiring manual patching in the document.
10.2.4 Using LaTeX Hooks Directly
It is also possible to use LaTeX’s native hook management system from within
usepackage.4ht. For example, to disable footnote superscripts only in the doc
package documentation, one can use:
\AddToHook{package/doc/before}{\SUPOff} \AddToHook{package/doc/after}{\SUPOn}
This approach is especially useful for packages that provide their own well-defined hooks.
10.2.5 Local Modifications with usepackage-user.4ht
Another possibility for loading local definitions before a package is processed is to use
the file usepackage-user.4ht. This file works in the same way as usepackage.4ht,
but it is intended specifically for user-level customizations. By placing your
own hooks or redefinitions there, you can adapt the behavior of packages
locally, without modifying the official configuration files distributed with
TeX4ht.
10.3 Commands Usable in the .4ht files
\NewConfigure{name}{number of defined hooks}
This command defines macros with an alphabetic prefix in the form of \a:name
…\i:name, depending on the number of defined hooks. The maximum number is
9.
\NewConfigure{try}{2} \def\try#1{\a:try#1\b:try} \Configure{try}{* }{} \try{ho} % produces "* ho"
\NewConfigure{name}[number or parameters]{code}
Variant of \NewConfigure that doesn’t define hooks with alphabetic
prefixes, but it passes arguments of \Configure as TeX arguments. See this
exampe:
\NewConfigure{try}[2]{\def\hookI{#1}\def\hookII{#2}} \def\try#1{\hookI#1\hookII} \Configure{try}{* }{} \try{ho} % produces "* ho"
When you use \Configure{try}, it defines \hookI and \hookII commands.
They can be then used in the redefined \try command.
\HLet{Redefined command name}{new command}
Variant of \let that saves the original command under \ø:<name>:
name. It can detect use of the redefined command inside picture. In such
case, it will use the original command to produce correct visual result in the
picture.
\NewConfigure{note}{3} \def\:tempa#1{\a:note note:\b:note~#1\c:note} \HLet\note\:tempa \Configure{note}{*}{*}{*} \note{hello} % produces: "* note:* hello*
\HRestore{command name}
Restore command redefined using \HLet to it’s original content.
\pend:def<X>{redefined command}{code to be inserted at the begin}
\append:def<X>{redefined command}{code to be inserted at the end}
These two commands inserts code before and after a redefined command. There
are several versions of these commands, depending on the number of parameters that
the redefined command expects. Number of parameters as roman number replaces
the <X> placeholder.
Up to three parameters are supported.
\newcommand\bar{xxx} \pend:def\bar{*} \append:def\bar{*} \bar % produces: "*xxx*" \newcommand\foo[2]{#1, #2} \pend:defII\foo{*} \append:defII\foo{*} \foo{a}{b} % produces "*a, b*"
\:CheckOption{option name}
\if:Option
Support for custom options. The \:CheckOption checks if the given option is
active, and \if:Option conditional then run true or false branch.
\:CheckOption{info}\if:Option ... \else ... \fi
10.4 Two types of .4ht files
The compilation starts by opening tex4ht.sty and loading a fraction of its code. The
main purpose of this phase is to request the loading of the system at a later time (for
instance, upon reaching \begin{document}). The motivation for the late loading is
to allow TeX4ht to collect as much information as possible about the environment
requested by the source file, and help the system reshape that environment with
minimal interference from elsewhere.
The system uses two kinds of (4ht) configuration files. The files of the first kind
mainly seed hooks into the macros loaded by the source file (for instance,
latex.4ht, fontmath.4ht, and article.4ht). The files of the second kind mainly
attach meaning to the hooks (for instance, html4.4ht, unicode.4ht, and
mathml.4ht).
Different source files may request the loading of different style files and in
different orders. The hook seeding files are loaded in response to the loading of the
style files, and in a compatible order. Since the different style files may redefine the
syntax and semantics of macros, TeX4ht follows a similar route of defining and
redefining the hooks and their meanings.
10.4.1 Custom output formats
The meaning attaching files are normally requested through option names introduced
in the tex4ht.4ht system file. It defines options for all output formats supported
by TeX4ht. For instance, html5, ooffice for the ODT output, tei, and so
on.
These options are passed to TeX4ht by make4htaccording to the --format
command line parameter, but you can pass them also yourself.
The user may add option names, and redefine old ones, within a new file named
tex4ht.usr.
A new tex4ht.usr file should group references to *.4ht configuration files under
arbitrarily chosen option names. For that purpose, \Configure commands similar to
those provided in tex4ht.4ht should be employed. These are particularly useful if
you use custom packages that are not included in TeX distributions and thus aren’t
supported by TeX4ht.
You can place your custom .4ht files or tex4ht.usr in your local TEXMFHOME
tree, for instance in ~/texmf/tex/latex/my4htfiles.
Location of the TEXMFHOME directory can be found using the following command:
$ kpsewhich -var-value TEXMFHOME
Example
Let’s say that you have a custom package mypackage.sty:
\newcommand\mycommand[1]{Hello #1} \endinput
This can be configured using the following configuration file, mypackage.4ht:
\NewConfigure{mycommand}{2} \pend:defI\mycommand{\a:mycommand} \append:defI\mycommand{\b:mycommand} \Hinput{mypackage} \endinput
Important command in this listing is \Hinput{mypackage}. The \Hinput expects
package name as it’s argument. It registers it for the latter processing in the output
format files.
Here is a custom output format file sample.4ht:
\exit:ifnot{mypackage} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \ConfigureHinput{mypackage} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Configure{mycommand}{\HCode{<span class="mycommand">}}{\HCode{</span>}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \endinput\empty\empty\empty\empty\empty\empty %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \endinput
The \exit:ifnot command takes comma separated list of packages supported by
the output format file. This stops it’s loading if the currently processed package
doesn’t have configurations in the file.
The configuration for the package is placed between \ConfigureHinput and
\endinput\empty\empty\empty\empty\empty\empty.
To request the custom output format file, we need to add it to tex4ht.usr. Here
is an example that adds a new option myhtml5. It is based on the code for the html5
option from tex4ht.4ht:
\Configure{myhtml5}{% \:CheckOption{info}\if:Option \Hinclude[*]{infoht4.4ht}\fi \:CheckOption{info}\if:Option \Hinclude[*]{infomml.4ht}\fi \Hinclude[*]{html4.4ht}% \Hinclude[*]{unicode.4ht}% \:CheckOption{mathml}\if:Option% \else\:CheckOption{mathml-}\fi% \if:Option% \Hinclude[*]{mathml.4ht}% \Hinclude[*]{html-mml.4ht}% \else \Hinclude[*]{html4-math.4ht}% \fi \:CheckOption{svg}% \if:Option \else\:CheckOption{svg-}\fi \if:Option \else\:CheckOption{svg-obj}\fi \if:Option \else\:CheckOption{svg-inline}\fi \if:Option \Hinclude[*]{svg-option.4ht}% \:CheckOption{info}\if:Option \Hinclude[*]{infosvg.4ht}\fi \fi \Hinclude[*]{html5.4ht}% \Hinclude[*]{sample.4ht} }
It uses the \:CheckOption commands to detect additional options, which results
in conditional loading of various output format files using the \Hinclude command.
Our custom output file sample.4ht is placed at the end.
You can then require the custom output format using this command:
$ make4ht filename.tex "myhtml5"
10.5 TeX4ht literate sources
To add a proper support for a new package, it is necessary to edit the TeX4ht
literate sources. All distributed TeX4ht files, including tex4ht.sty and all
.4ht files, are generated from these literate programming files. It is also the
reason why the generated files don’t contain much comments, these are in the
sources. If you want to understand how TeX4ht works, it is necessary to read
them.
The source files are available in the TeX4ht source repository. You can retrieve
them using a SVN client.
$ svn checkout https://svn.gnu.org.ua/sources/tex4ht/ $ cd tex4ht/trunk/lit/
The configurable hooks for all packages are contained by the tex4ht-4ht.tex file.
Configurations of these hooks is placed in the output format configuration files. The
most common output format is HTML, which can be configured in tex4ht-html4.tex,
or tex4ht-html5.tex if HTML5 features are used. You can also update sources for
other output formats, for example tex4th-ooffice.tex for the ODT format, or
tex4ht-tei.tex for TEI. The sources of the tex4ht.sty package are available in
tex4ht-sty.tex.
To compile all literate sources, run the make command. You will need basic UNIX
utilities for this to succeed, as well as m4 and javac. You can also compile
particular source files. Most of them can be compiled using LaTeX, but
some of them, for example tex4ht-4ht.tex, needs to be compiled using
etex.
10.5.1 How to add support for a package to the TeX4ht literate sources
Given following package sample.sty:
\ProvidesPackage{sample} \newcommand\hello{hello} \endinput
This simple package defines command \hello, which simply prints the word
“hello” when used in a document.
Let’s say that we want to insert some HTML tags before and after the text content
printed by the command.
Basic template for tex4ht-4ht.tex:
\<sample.4ht\><<< % sample.4ht (|version), generated from |jobname.tex % Copyright 2017 TeX Users Group |<TeX4ht license text|> \NewConfigure{hello}{2} \pend:def\hello{\a:hello} \append:def\hello{\b:hello} \Hinput{sample} \endinput >>> \AddFile{9}{sample}
Configuration for each package must follow this basic template. The ProTeX
system is used as system for literate programming.
The \<name\><<<code>>> block defines new macro which can be then called
using |<name|>. The license text is included in this way in the example. The
instruction to generate the .4ht file is given in the command \AddFile{9}{sample}
after the block definition. The first argument to \AddFile is an arbitrary
number.
Each package configuration must include \Hinput{packagename}, in order to
load the configurations for the package.
The command \NewConfigure{hello}{2} declares new configuration hello,
with two configurable hooks. These hooks are named \a:hello and \b:hello. The
hooks must be inserted into the \hello, which can be easily done using the
\pend:def and \append:def commands. These commands can insert code at the
beginning, respective at the end of the redefined command.
The package name must be also included in the mktex4ht-cnf.tex file. This file
is used in the generation of the
\AddFile{9}{sample}
You can place configuration for HTML to the tex4ht-html4.tex file:
\<configure html4 sample\><<< \Configure{hello}{\HCode{<span class="hello">}}{\HCode{</span>}} \Css{.hello{color:red;}} >>>
The \<configure html4 packagename\> block will produce code that
detects use of the package packagename. It then loads configurations for the
package.
The .4ht files can be generated simply using the make command.
The following sample TeX file:
\documentclass{article} \usepackage{sample} \begin{document} \hello\ world. \end{document}
Produces a following HTML code:
<!--l. 4--><p class="noindent" > <span class="hello">hello</span> world. </p>
10.6 ProTeX
The literate programming system used in the previous section is called ProTeX. We should discuss some main ideas behind this system.
Literate programming is a discipline that promotes the writing of programs the way one explains them to human beings. ProTeX is a literate programming system fully implemented in terms of TeX, and it is compatible with LaTeX and other TeX-base systems. TeX4ht, and ProTeX itself, are examples of applications written in ProTeX.
\input ProTex.sty \AlProTex{extension,<<<>>>,list,title,escape-character} \<title\><<< code fragment >>> |<title|> \OutputCode\<...\>
Some explanation:
\input ProTex.sty \AlProTex{extension,<<<>>>,list,title,escape-character}
The escape-character stands for ‘, @, |, or ?. If omitted, it stands for
|.
\<title\><<< code fragment >>>
This structure provides names to code fragments (the fragments should not be too large in size).
|<title|>
This command acts as a place holder for the code segment associated to the title
(| stands for the escape character).
\OutputCode\<...\>
This command creates a file for the code whose root node is specified.