Keys and indexed lookup¶
The external documents page joined a catalog to a code
list with an XPath predicate: code[@id = current()/@genre]. That works, but it
re-scans the whole code list on every lookup — for three codes it is
invisible, for thousands it is quadratic. xsl:key builds an index once, so
each lookup with key() is effectively constant-time. It is the scalable version
of the same join, and it has been in the language since 1.0 — everything here
runs in any processor.
Declaring a key¶
xsl:key is a top-level element (a direct child of xsl:stylesheet). It names an
index and says what to index and by which value:
-
Build an index called
cd-by-genreover everycdelement (match), keyed on each one'sgenreattribute (use). -
name— what you will pass tokey(). match— which nodes go into the index (the same pattern language asxsl:template match).use— an XPath, evaluated relative to each matched node, giving the value to file it under.
The processor walks the document once, building a map from key value → the nodes that have it.
Looking things up with key()¶
key('name', value) returns all nodes in that index whose key equals
value — a node-set, just like a predicate would yield, but found by lookup
rather than by scanning:
| by-genre.xsl | |
|---|---|
- Every
cdwhose@genreisROK, fetched from the index. Equivalent to//cd[@genre = 'ROK'], but it does not re-scan the tree.
A key can map many nodes to one value, and one node to many keys
key() returns a node-set, so a key value shared by several nodes returns
all of them — that is exactly what makes it useful for grouping. Conversely,
if use evaluates to multiple values (e.g. use="tokenize(@tags, ' ')" in
2.0, or a node-set in 1.0), the node is indexed under each of them.
The code-list join, re-done with a key¶
Here is the external-documents join again, but indexed. The code list lives in its own file:
| codes.xml | |
|---|---|
We index the code elements by their id, then look each catalog code up:
- The key indexes
codeelements wherever they occur — including inside the loadedcodes.xml. - Capture the catalog code into a variable before we change context (next
line), because once we move into
$codesthe bare@genrewould be gone. - The crucial step — see the warning below.
Rock
Pop
Country
key() only searches the current document
key() looks in the document that owns the context node at the moment you
call it. Straight inside for-each select="catalog/cd" the context document is
the source, so key('code-by-id', …) would search the catalog — and find
nothing. The inner for-each select="$codes" exists solely to switch the
context node into codes.xml, so the key resolves against the right
document. This is the standard 1.0 idiom for keyed lookups into a loaded file:
capture the value, switch context, call key().
Why the context switch is needed¶
The surprising part is that nothing connects the xsl:key declaration to
codes.xml directly. The link runs through the context node, in three steps:
xsl:keyis a rule, not an index.<xsl:key match="code" use="@id"/>does not index anything on its own. It is a standing instruction: in any document the stylesheet touches, index itscodeelements by@id.- Each document gets its own index. The processor applies that rule
separately to every document it parses. The source catalog has no
codeelements, so itscode-by-idindex is empty;codes.xmlhas them, so its index is full. Same key name, two independent indexes. key()reads the index of the context node's document.key()takes no document argument — it consults the current document, meaning the one that owns the context node at that moment.
So document('codes.xml') is what gets that file parsed and indexed;
for-each select="$codes" is what moves the context node into it; and only then
does key('code-by-id', …) read the right index. Drop the context switch and
key() would query the source catalog's empty index and return nothing.
Keys for grouping (Muenchian)¶
Before xsl:for-each-group (2.0), keys were also the engine of grouping. The
"Muenchian" technique keys every item by its grouping value, then keeps only the
first item of each value — the ones that are identical to the first node the key
returns for their own key:
- Keep a
cdonly if it is the first node its key returns — one representative per distinct genre.generate-id()compares node identity. - For each representative, pull the whole group back out of the index.
It works, but it is famously oblique. In 2.0/3.0 this entire pattern collapses to
xsl:for-each-group — see Grouping, which opens by retiring
exactly this trick.
How keys relate to maps¶
Keys and the 3.0 map type both give indexed, near-constant-time
lookup. They are not the same tool:
xsl:key / key() |
map (3.0) |
|
|---|---|---|
| Version | 1.0+ | 3.0 |
| Indexes | nodes in a document | any values |
| Returns | a node-set | anything — strings, sequences, nested maps |
| Lives | bound to a document; key() searches the context document |
a free value — passed to functions, returned, nested |
| Build | declarative, automatic, one top-level element | you construct it (often from loaded nodes) |
| Best when | indexing into a source tree and you want nodes back | you want a portable side table, typed keys, or non-node values |
A rough rule: reach for key() when the data you are indexing is the XML
you are already processing and you want the matching nodes; reach for a
map when you want a standalone lookup structure you can pass around, key by
typed values, or fill with records rather than nodes. See the
codelist-as-a-map discussion for the
3.0 alternative to the join above.
Next¶
That completes the XSLT 1.0 toolkit, indexed lookups included. The next section, Moving to XSLT 2.0 and 3.0, picks up where 1.0 leaves off — sequences and types, real functions, native grouping, and regular expressions that retire most of the 1.0 workarounds.