External documents¶
So far every example has read a single source document. But a stylesheet often
needs to pull in another file — a shared code list, a configuration, a second
data set to join against. The document() function does exactly that: it loads
an XML file and hands you back its nodes, which you then navigate with XPath
just like the main input.
Loading another file¶
document('uri') parses the file at uri and returns its node(s) — typically
the document root. From there, ordinary location paths reach inside it:
- Load
catalog.xml, then walk into it: the title of its first CD.
A relative URI is resolved against the location of the stylesheet, not the
source document and not the current working directory. So document('codes.xml')
finds a codes.xml sitting beside the .xsl file.
The argument is an XPath expression
document(...) takes an expression, not just a literal. document('codes.xml')
uses a string literal, but document(@href) would load whatever file the
current node's href attribute names — handy for following references.
The lookup-table join¶
The classic use is a code-list join: the main document stores terse codes,
and a small external file maps each code to a human-readable label. Suppose the
catalog tags each CD with a genre attribute holding a code:
The labels live in their own file:
| codes.xml | |
|---|---|
Bind the loaded document to a variable once, then look codes up in it with an XPath predicate:
- Load and parse
codes.xmlonce;$codesnow holds its node-set. - Inside the loop,
current()is thecdbeing processed. Find thecodewhoseidmatches this CD'sgenre, and read itslabel.
Empire Burlesque — Rock
Hide your heart — Pop
Greatest Hits — Country
Why current() here?
Inside the predicate [@id = ...], the context node is each code element,
so a bare @genre would (wrongly) mean the code's genre. current()
steps back out to the cd the for-each is iterating, so current()/@genre
is the CD's code. Outside a predicate the two coincide; inside one they do
not.
document('') — the stylesheet itself¶
A string argument of the empty string is special: document('') refers to the
stylesheet document itself. This is the standard XSLT 1.0 idiom for keeping a
small lookup table inside the stylesheet — declared under your own namespace
so the processor ignores it as instructions — and then reading it back as data.
- A custom namespace for the embedded data.
http://example.org/...is a safe, made-up URI that will never clash with XSLT's own. - The lookup table, sitting as a top-level child of
xsl:stylesheet. Because it is in thec:namespace, the processor treats it as foreign data, not as an instruction to execute. document('')reparses this stylesheet; the path then dives into itsxsl:stylesheetroot and selects the embeddedc:codeselement.
Empire Burlesque — Rock
Hide your heart — Pop
Greatest Hits — Country
Same result as the external file, but with the table and the logic shipped together in one file — convenient for short, stable code lists.
Top-level foreign elements need a namespace
A top-level element in an XSLT 1.0 stylesheet that has no namespace is an
error. The custom namespace (c: here) is what makes the embedded table
legal — and it is also what your XPath must match (c:code, not code).
Repeated calls and caching¶
Calling document() with the same URI more than once returns nodes from the
same parsed document — node identity is preserved, so comparisons behave
sensibly. Processors typically parse each distinct URI once and cache the
result, so binding the document to a variable up front (as above) is mostly a
matter of readability rather than performance. Even so, doing the load once and
naming it keeps the lookups tidy.
Next¶
The join above re-scans the code list on every lookup. For large
cross-references that does not scale, and the next page fixes it:
Keys and indexed lookup — xsl:key and key() build the index once,
turning the same document() join into a constant-time lookup.
That rounds out the core of XSLT 1.0: templates and apply-templates,
named templates and parameters, variables, control flow, XPath with predicates,
string handling, output control, sorting, and joining in external data with
document(). With these — plus keys next — you can express the great majority of
everyday transformations.