DocBook — semantic markup and single-source publishing¶
DocBook is the grandparent of structured technical documentation: a
vocabulary for books, articles, and manuals where you mark up meaning, not
appearance. You do not write "bold, monospaced" — you write
<command>, <filename>, <warning>. One source document then becomes HTML,
PDF, and EPUB, each rendering those meanings its own way. This page is about
DocBook the format; the stylesheets that render it are
studied separately as a large XSLT codebase.
A small specimen¶
- The namespace. Every element lives in
http://docbook.org/ns/docbook, declared once as the default namespace on the root. This is the thing that changed between DocBook 4 and 5 — see below. - Structure carries identity.
xml:idgives the section a stable handle; cross-references point at it, and it survives into the output as an HTML anchor or a PDF bookmark. - Semantic, not visual.
<warning>,<command>,<filename>say what something is. HTML output may render<warning>as a coloured callout box; print may render it as a framed note — the source does not care. - Profiling.
condition="cloud"is conditional text. The same source produces a "cloud" edition and an "on-prem" edition by filtering on this attribute at build time — single-source publishing in one attribute.
A fuller page, as its skeleton¶
The specimen above is deliberately tiny. Real DocBook pages are deep — and
that depth is the whole point, but it also makes the raw markup tiring to read.
Here is a fuller, ~70-line chapter — the Recovery section is the one the
small specimen's <xref linkend="recovery"/> was pointing at — rendered with
unxml as its semantic skeleton:
chapter(
version="5.2",
xml:id="backups",
xml:lang="en",
xmlns="http://docbook.org/ns/docbook",
xmlns:xlink="http://www.w3.org/1999/xlink")
title = Backups and recovery
para = The <command>widget</command> daemon keeps its state in a single database. This chapter explains how to snapshot that state and how to restore it after a mishap.
section(xml:id="how-backups-work")
title = How backups work
para = A backup is a consistent, point-in-time copy written to the directory named by the <envar>WIDGET_BACKUP_DIR</envar> environment variable. Each run produces two artefacts:
itemizedlist
listitem
para = a compressed data file, <filename>snapshot.wgz</filename>;
listitem
para = a manifest, <filename>snapshot.toml</filename>, recording the schema version and a checksum.
note
para = Backups are incremental by default. Pass <option>--full</option> to force a complete copy.
section(xml:id="creating-a-backup")
title = Creating a backup
para = Run <command>widget backup</command> from any directory:
screen =
| $ widget backup --full
| Writing snapshot.wgz ... done (4.2 MiB)
| Manifest written to /var/backups/widget/snapshot.toml
para =
| To schedule it, point your scheduler at the same command. A minimal
| configuration block looks like this:
programlisting(language="toml") =
| [backup]
| dir = "/var/backups/widget"
| schedule = "0 3 * * *" # 03:00 daily
| keep = 7 # retain a week of snapshots
tip
para = Set <envar>WIDGET_BACKUP_DIR</envar> to a volume on a different disk than the live database.
section(xml:id="recovery")
title = Recovery
para = To restore from a snapshot, stop the daemon and run <command>widget restore</command> with the path to a manifest:
orderedlist
listitem
para = Stop the service: <command>widget stop</command>.
listitem
para = Restore: <command>widget restore</command> <replaceable>manifest.toml</replaceable>.
listitem
para = Verify and restart: <command>widget start</command>.
warning
para = Restoring overwrites the live database. Never run <command>widget restore</command> against a production instance without first taking a fresh backup.
para = The restore command accepts these options:
variablelist
varlistentry
term
option = --dry-run
listitem
para =
| Validate the manifest and report what would change,
| without writing anything.
varlistentry
term
option = --force
listitem
para =
| Skip the schema-version check. Use only when migrating
| between major versions.
Instances are normally shown as raw XML on this site, but DocBook is the case
that justifies the flattened view: what you are reading is the document — a
clean nested outline of sections, lists, and admonitions. Notice how unxml
draws the line. The block structure loses its angle brackets entirely; but the
inline prose — a <para> with a <command> or <filename> in it — is kept as
one flowing line of verbatim XML, because there the markup is the content and
the original reads best. Flatten the scaffolding, quote the prose. What remains
is exactly the structure the stylesheets walk to produce
HTML, PDF, and EPUB — which is why one source can feed many outputs.
The raw DocBook source (73 lines)
<chapter xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.2" xml:id="backups" xml:lang="en">
<title>Backups and recovery</title>
<para>The <command>widget</command> daemon keeps its state in a single
database. This chapter explains how to snapshot that state and how to
restore it after a mishap.</para>
<section xml:id="how-backups-work">
<title>How backups work</title>
<para>A backup is a consistent, point-in-time copy written to the directory
named by the <envar>WIDGET_BACKUP_DIR</envar> environment variable. Each
run produces two artefacts:</para>
<itemizedlist>
<listitem><para>a compressed data file, <filename>snapshot.wgz</filename>;</para></listitem>
<listitem><para>a manifest, <filename>snapshot.toml</filename>, recording the
schema version and a checksum.</para></listitem>
</itemizedlist>
<note>
<para>Backups are incremental by default. Pass
<option>--full</option> to force a complete copy.</para>
</note>
</section>
<section xml:id="creating-a-backup">
<title>Creating a backup</title>
<para>Run <command>widget backup</command> from any directory:</para>
<screen>$ widget backup --full
Writing snapshot.wgz ... done (4.2 MiB)
Manifest written to /var/backups/widget/snapshot.toml</screen>
<para>To schedule it, point your scheduler at the same command. A minimal
configuration block looks like this:</para>
<programlisting language="toml">[backup]
dir = "/var/backups/widget"
schedule = "0 3 * * *" # 03:00 daily
keep = 7 # retain a week of snapshots</programlisting>
<tip>
<para>Set <envar>WIDGET_BACKUP_DIR</envar> to a volume on a different
disk than the live database.</para>
</tip>
</section>
<section xml:id="recovery">
<title>Recovery</title>
<para>To restore from a snapshot, stop the daemon and run
<command>widget restore</command> with the path to a manifest:</para>
<orderedlist>
<listitem><para>Stop the service: <command>widget stop</command>.</para></listitem>
<listitem><para>Restore: <command>widget restore</command>
<replaceable>manifest.toml</replaceable>.</para></listitem>
<listitem><para>Verify and restart: <command>widget start</command>.</para></listitem>
</orderedlist>
<warning>
<para>Restoring overwrites the live database. Never run
<command>widget restore</command> against a production instance without
first taking a fresh backup.</para>
</warning>
<para>The restore command accepts these options:</para>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem><para>Validate the manifest and report what would change,
without writing anything.</para></listitem>
</varlistentry>
<varlistentry>
<term><option>--force</option></term>
<listitem><para>Skip the schema-version check. Use only when migrating
between major versions.</para></listitem>
</varlistentry>
</variablelist>
</section>
</chapter>
The namespace pattern it shows: a vocabulary growing up¶
DocBook is the textbook case of a vocabulary acquiring a namespace when it matured:
- DocBook 4 and earlier were defined by a DTD. DTDs predate XML
Namespaces and have no concept of them, so a DocBook 4
<article>was in no namespace at all — just a bare element name validated against a DTD. - DocBook 5 redefined the language as a RELAX NG schema and moved every
element into
http://docbook.org/ns/docbook. The version jump was, in large part, the addition of a namespace — which is why a DocBook 5 document fails against a DocBook 4 toolchain and vice versa, even when the elements look identical.
That is the same lifecycle you see across this section: a format starts informal, and the moment it needs to coexist with others — be embedded, be extended, be mixed — it claims a namespace to make its names globally unambiguous.
Validated with RELAX NG, not XSD
DocBook 5 is one of the most prominent vocabularies whose normative schema is RELAX NG, not XSD. RELAX NG's grammar model expresses DocBook's "this element allows this loose soup of inline children" patterns far more naturally than XSD's content models do. A W3C XML Schema and a DTD are generated from the RELAX NG for tools that need them, but the RELAX NG is the source of truth. If you have only met XSD, DocBook is the reason to know RELAX NG exists.
Single source, many outputs¶
The payoff of all that structure is one input and many renderings:
flowchart LR
src["guide.xml<br/>(DocBook 5)"] --> xslt["DocBook<br/>stylesheets"]
xslt --> html["HTML site"]
xslt --> pdf["PDF<br/>(via XSL-FO)"]
xslt --> epub["EPUB"]
src -.->|profile: cloud| cloud["cloud edition"]
src -.->|profile: on-prem| prem["on-prem edition"]
- The HTML and EPUB paths are XSLT straight to (X)HTML.
- The PDF path is XSLT to XSL-FO, then a formatter (Apache FOP) to PDF — exactly the generated, not authored vocabulary pattern that page describes.
- Profiling (the
conditionattribute above) prunes the tree before rendering, so each edition is a real subset, not CSS hiding.
This is why DocBook persists in toolchains decades on: the cost of rich semantic markup is paid once, and every output format — including ones that did not exist when the document was written — collects the dividend.
Where the engine is dissected
The stylesheets that perform these transforms are themselves one of the largest readable XSLT codebases anywhere. The Modern XSLT section walks the DocBook xslTNG stylesheets as a case study in XSLT at scale — modes, function libraries, and import-precedence customization layers in a real 50-file project.
The modern move: author light, generate DocBook¶
If you started a documentation project today, you would probably not type DocBook by hand. The field reached a clear conclusion: XML is an excellent machine format and an awkward human authoring format. Angle brackets are verbose, and writers resent them.
So the modern pattern inverts the roles. Authors write lightweight markup — most often AsciiDoc (via Asciidoctor) or Markdown — and a tool generates DocBook from it:
== Installation
Run `widget init` in your project root. It writes a `widget.toml` you can edit.
WARNING: Never run `widget reset` on a production database.
asciidoctor -b docbook guide.adoc turns that into the <section> /
<command> / <warning> tree shown above. AsciiDoc was, in fact, designed as
a more writable front-end for DocBook — the XML never disappears, it just stops
being the thing humans touch. DocBook (or HTML, or PDF) becomes the durable
interchange and output layer underneath.
That is the same theme this whole section keeps hitting: XML earns its keep where interchange, validation, and longevity matter — less so where people type. Within XML proper there is no newer documentation vocabulary displacing DocBook; the one thriving modern XML doc format, JATS (the Journal Article Tag Suite), lives in exactly the niche where archival XML still wins outright: scholarly publishing. The XML stayed; the typing moved on.