Real-world XML applications¶
The e-invoicing section followed one XML vocabulary all the way down — UBL, its XSD, its Schematron, its code lists. This section does the opposite: it visits many real XML applications, briefly, to show how the format is actually used in the wild — the vocabularies people exchange, and the APIs programs use to read and write them.
Two strands run through it:
- The vocabularies — SVG, SOAP, Office documents, feeds, XBRL, and the rest.
The moment XML leaves the textbook, the interesting part is rarely the elements;
it is the namespaces: how a document mixes two of them, how a schema in one
namespace
imports another, how a base vocabulary is extended without being changed. Each vocabulary page is built around one of those patterns. - Working with XML in code — the parsing models (DOM, SAX, pull) and the APIs
built on them, where the same namespace ideas reappear as
NamespaceContext, prefix-to-URI maps, and the streaming-vs-tree trade-off every XML program makes.
Read the namespace chapters first
These pages assume you know what a namespace is and how a schema declares
one. If targetNamespace, elementFormDefault, or
import vs include are fuzzy, skim
XSD → Modular schemas and the
namespaces part of Anatomy of a UBL invoice
first — they are the prerequisites for everything below.
Reading XML with unxml¶
Real documents are verbose — and schemas worst of all. A 500-line
XSD buries its data model under xs:complexType /
xs:sequence / xs:element scaffolding. So whenever these pages show a schema,
they show it rendered with unxml --xsd,
a small tool that rewrites the xs:* vocabulary into a terse,
type-declaration-like pseudocode that reads like the data model it describes:
type wptType
ele : xsd:decimal ?
time : xsd:dateTime ?
@lat : latitudeType (required)
@lon : latitudeType (required)
type latitudeType : xsd:decimal [-90.0..90.0]
? is optional, * zero-or-more, : is "typed as", @ an attribute,
[m..n] a range — the whole XSD vocabulary compressed to a
handful of symbols. It makes a long schema browseable. (unxml also flattens
plain instance documents into the same indented form — attributes into ( … ),
text as name = value — but for the short instances on these pages the raw XML is
clearer, so we show that directly.)
Try it yourself
unxml installs from PyPI (uv tool install unxml-rs) or
GitHub releases. There is a
gallery of real documents rendered side-by-side at
vivainio.github.io/unxml-demos. The
full set of schema transformations is documented in the
unxml XSD docs.
The tour — vocabularies¶
Each page is self-contained — jump to whichever vocabulary you care about. They are ordered roughly from "you see it every day" to "you only see it in finance".
| # | Vocabulary | The namespace pattern it shows |
|---|---|---|
| 1 | SVG | A default namespace plus a borrowed xlink: prefix — and the same SVG embedded inside HTML, the canonical mixed-namespace document |
| 2 | SOAP & WSDL | Several namespaces as a contract: an envelope wrapping a payload, and a WSDL that imports its XSD types |
| 3 | Office: OOXML & ODF | A ZIP of XML parts, each a forest of namespaces, wired together by relationships |
| 4 | DocBook | A vocabulary growing up: DocBook 4's no-namespace DTD becomes DocBook 5's namespaced RELAX NG — semantic markup, one source to many outputs |
| 5 | Atom & feed extensions | Extending a base vocabulary — Dublin Core, iTunes podcast tags — without touching it |
| 6 | XSL-FO & Apache FOP | A vocabulary that is generated, not authored: XSLT → fo: → PDF |
| 7 | XML Signature | Why namespaces force canonicalization: signing bytes that mean the same thing |
| 8 | SAML | Signed assertions and SSO: four namespaces (saml:/samlp:/ds:/md:), one per layer, and XML-DSig doing real work |
| 9 | GPX & KML | Two geo vocabularies, two extension styles (<extensions> vs a gx: prefix) |
| 10 | XBRL | Namespacing taken to the limit: a taxonomy of concepts in your namespace, built on xbrli: |
The tour — working with XML in code¶
The vocabularies above are what gets exchanged; these pages are how programs process it. Start with the concepts page, then dip into whichever runtime you work in — they cover the same five tasks (parse, navigate with namespaces, validate, transform, write) so you can compare them directly.
| Page | What it covers |
|---|---|
| Working with XML in code | The three parsing models — DOM, SAX, pull/streaming — and the namespace API (prefix → URI) that every language reinvents. Read this first. |
| APIs: Java | JAXP (DOM/SAX/StAX), XPath with NamespaceContext, XSD validation, XSLT via Saxon, JAXB data binding |
| APIs: .NET | LINQ-to-XML (XDocument), XmlReader streaming, XPath with XmlNamespaceManager, XmlSchemaSet, XslCompiledTransform |
| APIs: Python | lxml and ElementTree, namespace maps, iterparse streaming, XPath / XSLT / XML Schema, xsdata binding |
| APIs: Rust | quick-xml streaming, roxmltree, XPath/XSD via libxml2 — and where Rust stops (no native XSLT ⅔). The language unxml itself is written in |
The data is illustrative
The documents here are hand-written specimens — real structure, namespaces and schema shapes, made-up content. Where a snippet is trimmed from a larger real file, it says so. The code snippets are idiomatic and correct against current library APIs, but kept short — they show the shape of each API, not a complete program.