Chapter 10
For Developers
This chapter deals with TeX4ht
development. It starts with a basic tutorial for a
new package support, shows commands useful in the process, different types of
TeX4ht
configuration files, and the syntax and structure of literate source
files.
10.1 Tutorial: Basic Support For a New Package
In this tutorial, we will try to show how to provide TeX4ht
support for a simple
LaTeX package.
TeX4ht
tries to load a special .4ht
file for each package loaded by LaTeX. This
special file can contain modifications to commands provided by the package, like
redefinitions of macros that cause clashes between the package and TeX4ht
, and most
importantly they insert special macros, called hooks, that are then used to include
the output format tags.
Let’s say that you have a custom package, called mynote.sty
\newcommand\notetitle{Note:~} \newcommand\note[1]{\textbf{\notetitle}#1} \newcommand\highlight[1]{\textbf{#1}} \endinput
It defines two user commands, \note
and \highlight
. They can be used in the
following way:
\documentclass{article} \usepackage{mynote} \begin{document} \note{This is a note} Try to highlight \highlight{something}. \end{document}
TeX4ht
produces usable output for both of these commands out of the box,
thanks to the support for TeX fonts. But you may want to use custom HTML tags
instead. To achieve that, you need to insert special commands, called hooks in
TeX4ht
, to package commands. These hooks can be then configured to insert tags in
the output format.
To introduce hooks, you need to create a hook seeding configuration file for the
package, called <name>.4ht
. For example, to seed hooks for the mynote.sty
package,
create file mynote.4ht
:
\NewConfigure{note}{3} % Use \HLet when you want to completely redefine a command \def\:tempa#1{\a:note\notetitle\b:note~#1\c:note} \HLet\note\:tempa \NewConfigure{highlight}{2} \pend:defI\highlight{\a:highlight} \append:defI\highlight{\b:highlight} \Hinput{mynote} \endinput
There is several things to note. First is that the :
character can be included as a
part of a command name in .4ht
files. It is similar to use of the @
character in LaTeX
packages. It allows us to create command names that don’t clash with other
command names.
The hooks are created using the \NewConfigure
command. They can be later
filled with the \Configure
command. To have an effect, hooks must be
inserted to the existing commands. There are two ways how to do that. For
simpler commands, where we want to insert tags only before and after the
contents produced by the patched command, we can use the \pend:def<X>
and
\append:def<X>
commands, where the <X>
is a roman number of parameters that
the patched command expects. In this example, it expects one parameter, so we
can use the \pend:defI
command. For commands without parameters, use
\pend:def
.
Of course, you can also insert hooks using other mechanisms, for example using LaTeX’s hook system:
\AddToHook{cmd/highlight/before}{\a:highlight} \AddToHook{cmd/highlight/after}{\b:highlight}
The second way for hook insertion, useful for commands where we want to insert
tags also inside it’s contents, is to use the \HLet
command. It is a variant of
the \let
command. In contrast to \let
, it saves the original command as
\o:<command name>:
. Commands redefined by \HLet
also support the \Picture
command, where the original version of the command will be used. This way,
pictures will produce the same result as they would produce in the PDF
mode.
In our example, we redefined the \note
command to use a hook between
note title and note text. This enables us to style both the title and the text
differently.
The configuration file for our hooks could look like this:
\Preamble{xhtml} \Configure{note} {\ifvmode\IgnorePar\fi\EndP\HCode{<div class="note"><span class="notetitle">}} {\HCode{</span><span class="notebody">}} {\HCode{</span></div>}} \Css{.notetitle{font-weight: bold;}} \Configure{highlight}{\HCode{<span class="highlight">}\NoFonts}{\EndNoFonts\HCode{</span>}} \Css{.highlight{font-weight:bold;}} \begin{document} \EndPreamble
As the \note
command should be used on it’s own paragraph, we need to fix
paragraph closing. See the Paragraph Handling section for more information about
this issue. More details about configuration files and configurations are in section
Private Configuration Files.
The HTML code produced by our configuration looks like this:
<div class='note'><span class='notetitle'>Note: </span><span class='notebody'> This is a note</span></div> <!-- l. 6 --><p class='indent'> Try to highlight <span class='highlight'>something</span>. </p>
10.2 Commands Usable in the .4ht
files
\NewConfigure
{name}{number of defined hooks}
This command defines macros with an alphabetic prefix in the form of \a:name
…\i:name
, depending on the number of defined hooks. The maximum number is
9.
\NewConfigure{try}{2} \def\try#1{\a:try#1\b:try} \Configure{try}{* }{} \try{ho} % produces "* ho"
\NewConfigure
{name}[number or parameters]{code}
Variant of \NewConfigure
that doesn’t define hooks with alphabetic
prefixes, but it passes arguments of \Configure
as TeX arguments. See this
exampe:
\NewConfigure{try}[2]{\def\hookI{#1}\def\hookII{#2}} \def\try#1{\hookI#1\hookII} \Configure{try}{* }{} \try{ho} % produces "* ho"
When you use \Configure{try}
, it defines \hookI
and \hookII
commands.
They can be then used in the redefined \try
command.
\HLet
{Redefined command name}{new command}
Variant of \let
that saves the original command under \ø:<name>:
name. It can detect use of the redefined command inside picture. In such
case, it will use the original command to produce correct visual result in the
picture.
\NewConfigure{note}{3} \def\:tempa#1{\a:note note:\b:note~#1\c:note} \HLet\note\:tempa \Configure{note}{*}{*}{*} \note{hello} % produces: "* note:* hello*
\HRestore
{command name}
Restore command redefined using \HLet
to it’s original content.
\pend:def<X>
{redefined command}{code to be inserted at the begin}
\append:def<X>
{redefined command}{code to be inserted at the end}
These two commands inserts code before and after a redefined command. There
are several versions of these commands, depending on the number of parameters that
the redefined command expects. Number of parameters as roman number replaces
the <X>
placeholder.
Up to three parameters are supported.
\newcommand\bar{xxx} \pend:def\bar{*} \append:def\bar{*} \bar % produces: "*xxx*" \newcommand\foo[2]{#1, #2} \pend:defII\foo{*} \append:defII\foo{*} \foo{a}{b} % produces "*a, b*"
\:CheckOption
{option name}
\if:Option
Support for custom options. The \:CheckOption
checks if the given option is
active, and \if:Option
conditional then run true or false branch.
\:CheckOption{info}\if:Option ... \else ... \fi
10.3 Two types of .4ht files
The compilation starts by opening tex4ht.sty and loading a fraction of its code. The
main purpose of this phase is to request the loading of the system at a later time (for
instance, upon reaching \begin{document}
). The motivation for the late loading is
to allow TeX4ht to collect as much information as possible about the environment
requested by the source file, and help the system reshape that environment with
minimal interference from elsewhere.
The system uses two kinds of (4ht) configuration files. The files of the first kind
mainly seed hooks into the macros loaded by the source file (for instance,
latex.4ht
, fontmath.4ht
, and article.4ht
). The files of the second kind mainly
attach meaning to the hooks (for instance, html4.4ht
, unicode.4ht
, and
mathml.4ht
).
Different source files may request the loading of different style files and in
different orders. The hook seeding files are loaded in response to the loading of the
style files, and in a compatible order. Since the different style files may redefine the
syntax and semantics of macros, TeX4ht
follows a similar route of defining and
redefining the hooks and their meanings.
10.3.1 Custom output formats
The meaning attaching files are normally requested through option names introduced
in the tex4ht.4ht
system file. It defines options for all output formats supported
by TeX4ht
. For instance, html5, ooffice for the ODT output, tei, and so
on.
These options are passed to TeX4ht
by make4ht
according to the --format
command line parameter, but you can pass them also yourself.
The user may add option names, and redefine old ones, within a new file named
tex4ht.usr
.
A new tex4ht.usr file should group references to *.4ht
configuration files under
arbitrarily chosen option names. For that purpose, \Configure
commands similar to
those provided in tex4ht.4ht
should be employed. These are particularly useful if
you use custom packages that are not included in TeX distributions and thus aren’t
supported by TeX4ht
.
You can place your custom .4ht
files or tex4ht.usr
in your local TEXMFHOME
tree, for instance in ~/texmf/tex/latex/my4htfiles
.
Location of the TEXMFHOME directory can be found using the following command:
$ kpsewhich -var-value TEXMFHOME
Example
Let’s say that you have a custom package mypackage.sty
:
\newcommand\mycommand[1]{Hello #1} \endinput
This can be configured using the following configuration file, mypackage.4ht
:
\NewConfigure{mycommand}{2} \pend:defI\mycommand{\a:mycommand} \append:defI\mycommand{\b:mycommand} \Hinput{mypackage} \endinput
Important command in this listing is \Hinput{mypackage}
. The \Hinput
expects
package name as it’s argument. It registers it for the latter processing in the output
format files.
Here is a custom output format file sample.4ht
:
\exit:ifnot{mypackage} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \ConfigureHinput{mypackage} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \Configure{mycommand}{\HCode{<span class="mycommand">}}{\HCode{</span>}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \endinput\empty\empty\empty\empty\empty\empty %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \endinput
The \exit:ifnot
command takes comma separated list of packages supported by
the output format file. This stops it’s loading if the currently processed package
doesn’t have configurations in the file.
The configuration for the package is placed between \ConfigureHinput
and
\endinput\empty\empty\empty\empty\empty\empty
.
To request the custom output format file, we need to add it to tex4ht.usr
. Here
is an example that adds a new option myhtml5. It is based on the code for the html5
option from tex4ht.4ht
:
\Configure{myhtml5}{% \:CheckOption{info}\if:Option \Hinclude[*]{infoht4.4ht}\fi \:CheckOption{info}\if:Option \Hinclude[*]{infomml.4ht}\fi \Hinclude[*]{html4.4ht}% \Hinclude[*]{unicode.4ht}% \:CheckOption{mathml}\if:Option% \else\:CheckOption{mathml-}\fi% \if:Option% \Hinclude[*]{mathml.4ht}% \Hinclude[*]{html-mml.4ht}% \else \Hinclude[*]{html4-math.4ht}% \fi \:CheckOption{svg}% \if:Option \else\:CheckOption{svg-}\fi \if:Option \else\:CheckOption{svg-obj}\fi \if:Option \else\:CheckOption{svg-inline}\fi \if:Option \Hinclude[*]{svg-option.4ht}% \:CheckOption{info}\if:Option \Hinclude[*]{infosvg.4ht}\fi \fi \Hinclude[*]{html5.4ht}% \Hinclude[*]{sample.4ht} }
It uses the \:CheckOption
commands to detect additional options, which results
in conditional loading of various output format files using the \Hinclude
command.
Our custom output file sample.4ht
is placed at the end.
You can then require the custom output format using this command:
$ make4ht filename.tex "myhtml5"
10.4 Early Hooks in usepackage.4ht
Normal .4ht
files are loaded once the document preamble was processed. This is
usually desirable, as there are packages that redefine other packages commands, and
this way can prevent some possible clashes in such cases. Hovewer, sometimes we
need to fix package macros as soon as the package is loaded, in other cases, we need
to block the package from loading completely. This can be necessary when the
package causes fatal error when used.
For these cases, TeX4ht
uses a special file, usepackage.4ht
, where you can
declare code that can be executed before the package is loaded.
As it is loaded multiple times it is best to keep it short and place longer pieces of code to a separate file. Sample code that loads such code looks like this:
\Configure{PackageHooks}{foo.sty}{foo-hooks.4ht}
The <pkgname>-hooks.4ht
name is usually used to distinguish this early hooks
file from the usual .4ht
files. The general structure of the <pkgname>-hooks.4ht
file
is following:
code to be executed before package loading \:AtEndOfPackage{ code to be executed after package loading }
There are two useful commands available:
\:dontusepackage
{package name} – prevent package from loading. It can be used to
disable packages that cause fatal error with TeX4ht
.
\:AtEndOfPackage
{code to be executed} – execute code after the package was
loaded. Useful for redefinition of commands that can be used in the document
preamble.
10.4.1 Execute Code Directly in usepackage.4ht
You can also execute shorter pieces of code directly in usepackage.4ht
thanks for
the new LaTeX package hooks. For example, the following code fixes catcode issues
with the ^
character in the doc
package:
\AddToHook{package/doc/before}{\SUPOff} \AddToHook{package/doc/after}{\SUPOn}
The SUPOff
disables catcode changes to this character that TeX4ht
uses in order
to insert markup for math superscripts, and SUPOn
enables it again once the package
was processed.
10.5 TeX4ht
literate sources
To add a proper support for a new package, it is necessary to edit the TeX4ht
literate sources. All distributed TeX4ht
files, including tex4ht.sty
and all
.4ht
files, are generated from these literate programming files. It is also the
reason why the generated files don’t contain much comments, these are in the
sources. If you want to understand how TeX4ht
works, it is necessary to read
them.
The source files are available in the TeX4ht
source repository. You can retrieve
them using a SVN client.
$ svn checkout https://svn.gnu.org.ua/sources/tex4ht/ $ cd tex4ht/trunk/lit/
The configurable hooks for all packages are contained by the tex4ht-4ht.tex
file.
Configurations of these hooks is placed in the output format configuration files. The
most common output format is HTML
, which can be configured in tex4ht-html4.tex
,
or tex4ht-html5.tex
if HTML5
features are used. You can also update sources for
other output formats, for example tex4th-ooffice.tex
for the ODT format, or
tex4ht-tei.tex
for TEI. The sources of the tex4ht.sty
package are available in
tex4ht-sty.tex
.
To compile all literate sources, run the make
command. You will need basic UNIX
utilities for this to succeed, as well as m4
and javac
. You can also compile
particular source files. Most of them can be compiled using LaTeX, but
some of them, for example tex4ht-4ht.tex
, needs to be compiled using
etex
.
10.5.1 How to add support for a package to the TeX4ht
literate sources
Given following package sample.sty
:
\ProvidesPackage{sample} \newcommand\hello{hello} \endinput
This simple package defines command \hello
, which simply prints the word
“hello” when used in a document.
Let’s say that we want to insert some HTML
tags before and after the text content
printed by the command.
Basic template for tex4ht-4ht.tex
:
\<sample.4ht\><<< % sample.4ht (|version), generated from |jobname.tex % Copyright 2017 TeX Users Group |<TeX4ht license text|> \NewConfigure{hello}{2} \pend:def\hello{\a:hello} \append:def\hello{\b:hello} \Hinput{sample} \endinput >>> \AddFile{9}{sample}
Configuration for each package must follow this basic template. The ProTeX
system is used as system for literate programming.
The \<name\><<<code>>>
block defines new macro which can be then called
using |<name|>
. The license text is included in this way in the example. The
instruction to generate the .4ht
file is given in the command \AddFile{9}{sample}
after the block definition. The first argument to \AddFile
is an arbitrary
number.
Each package configuration must include \Hinput{packagename}
, in order to
load the configurations for the package.
The command \NewConfigure{hello}{2}
declares new configuration hello
,
with two configurable hooks. These hooks are named \a:hello
and \b:hello
. The
hooks must be inserted into the \hello
, which can be easily done using the
\pend:def
and \append:def
commands. These commands can insert code at the
beginning, respective at the end of the redefined command.
The package name must be also included in the mktex4ht-cnf.tex
file. This file
is used in the generation of the
\AddFile{9}{sample}
You can place configuration for HTML
to the tex4ht-html4.tex
file:
\<configure html4 sample\><<< \Configure{hello}{\HCode{<span class="hello">}}{\HCode{</span>}} \Css{.hello{color:red;}} >>>
The \<configure html4 packagename\>
block will produce code that
detects use of the package packagename
. It then loads configurations for the
package.
The .4ht
files can be generated simply using the make
command.
The following sample TeX file:
\documentclass{article} \usepackage{sample} \begin{document} \hello\ world. \end{document}
Produces a following HTML
code:
<!--l. 4--><p class="noindent" > <span class="hello">hello</span> world. </p>
10.6 ProTeX
The literate programming system used in the previous section is called ProTeX. We should discuss some main ideas behind this system.
Literate programming is a discipline that promotes the writing of programs the way one explains them to human beings. ProTeX is a literate programming system fully implemented in terms of TeX, and it is compatible with LaTeX and other TeX-base systems. TeX4ht, and ProTeX itself, are examples of applications written in ProTeX.
\input ProTex.sty \AlProTex{extension,<<<>>>,list,title,escape-character} \<title\><<< code fragment >>> |<title|> \OutputCode\<...\>
Some explanation:
\input ProTex.sty \AlProTex{extension,<<<>>>,list,title,escape-character}
The escape-character stands for ‘, @, |, or ?. If omitted, it stands for
|
.
\<title\><<< code fragment >>>
This structure provides names to code fragments (the fragments should not be too large in size).
|<title|>
This command acts as a place holder for the code segment associated to the title
(|
stands for the escape character).
\OutputCode\<...\>
This command creates a file for the code whose root node is specified.