Skip to content

External documents

So far every example has read a single source document. But a stylesheet often needs to pull in another file — a shared code list, a configuration, a second data set to join against. The document() function does exactly that: it loads an XML file and hands you back its nodes, which you then navigate with XPath just like the main input.

Loading another file

document('uri') parses the file at uri and returns its node(s) — typically the document root. From there, ordinary location paths reach inside it:

<xsl:value-of select="document('catalog.xml')/catalog/cd[1]/title"/>   <!-- (1)! -->
  1. Load catalog.xml, then walk into it: the title of its first CD.

A relative URI is resolved against the location of the stylesheet, not the source document and not the current working directory. So document('codes.xml') finds a codes.xml sitting beside the .xsl file.

The argument is an XPath expression

document(...) takes an expression, not just a literal. document('codes.xml') uses a string literal, but document(@href) would load whatever file the current node's href attribute names — handy for following references.

The lookup-table join

The classic use is a code-list join: the main document stores terse codes, and a small external file maps each code to a human-readable label. Suppose the catalog tags each CD with a genre attribute holding a code:

catalog.xml
1
2
3
4
5
<catalog>
  <cd genre="ROK"><title>Empire Burlesque</title><artist>Bob Dylan</artist><price>10.90</price></cd>
  <cd genre="POP"><title>Hide your heart</title><artist>Bonnie Tyler</artist><price>9.90</price></cd>
  <cd genre="CNT"><title>Greatest Hits</title><artist>Dolly Parton</artist><price>9.90</price></cd>
</catalog>

The labels live in their own file:

codes.xml
1
2
3
4
5
6
<?xml version="1.0" encoding="UTF-8"?>
<codes>
  <code id="ROK" label="Rock"/>
  <code id="POP" label="Pop"/>
  <code id="CNT" label="Country"/>
</codes>

Bind the loaded document to a variable once, then look codes up in it with an XPath predicate:

genre-labels.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
  <xsl:variable name="codes" select="document('codes.xml')"/>   <!-- (1)! -->
  <html><body>
    <xsl:for-each select="catalog/cd">
      <p>
        <xsl:value-of select="title"/>         <xsl:value-of select="$codes/codes/code[@id = current()/@genre]/@label"/>  <!-- (2)! -->
      </p>
    </xsl:for-each>
  </body></html>
</xsl:template>
</xsl:stylesheet>
  1. Load and parse codes.xml once; $codes now holds its node-set.
  2. Inside the loop, current() is the cd being processed. Find the code whose id matches this CD's genre, and read its label.

Empire Burlesque — Rock

Hide your heart — Pop

Greatest Hits — Country

Why current() here?

Inside the predicate [@id = ...], the context node is each code element, so a bare @genre would (wrongly) mean the code's genre. current() steps back out to the cd the for-each is iterating, so current()/@genre is the CD's code. Outside a predicate the two coincide; inside one they do not.

document('') — the stylesheet itself

A string argument of the empty string is special: document('') refers to the stylesheet document itself. This is the standard XSLT 1.0 idiom for keeping a small lookup table inside the stylesheet — declared under your own namespace so the processor ignores it as instructions — and then reading it back as data.

embedded-codes.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:c="http://example.org/codes">       <!-- (1)! -->

<c:codes>                                                  <!-- (2)! -->
  <c:code id="ROK" label="Rock"/>
  <c:code id="POP" label="Pop"/>
  <c:code id="CNT" label="Country"/>
</c:codes>

<xsl:template match="/">
  <xsl:variable name="codes" select="document('')/xsl:stylesheet/c:codes"/>  <!-- (3)! -->
  <html><body>
    <xsl:for-each select="catalog/cd">
      <p>
        <xsl:value-of select="title"/>         <xsl:value-of select="$codes/c:code[@id = current()/@genre]/@label"/>
      </p>
    </xsl:for-each>
  </body></html>
</xsl:template>

</xsl:stylesheet>
  1. A custom namespace for the embedded data. http://example.org/... is a safe, made-up URI that will never clash with XSLT's own.
  2. The lookup table, sitting as a top-level child of xsl:stylesheet. Because it is in the c: namespace, the processor treats it as foreign data, not as an instruction to execute.
  3. document('') reparses this stylesheet; the path then dives into its xsl:stylesheet root and selects the embedded c:codes element.

Empire Burlesque — Rock

Hide your heart — Pop

Greatest Hits — Country

Same result as the external file, but with the table and the logic shipped together in one file — convenient for short, stable code lists.

Top-level foreign elements need a namespace

A top-level element in an XSLT 1.0 stylesheet that has no namespace is an error. The custom namespace (c: here) is what makes the embedded table legal — and it is also what your XPath must match (c:code, not code).

Repeated calls and caching

Calling document() with the same URI more than once returns nodes from the same parsed document — node identity is preserved, so comparisons behave sensibly. Processors typically parse each distinct URI once and cache the result, so binding the document to a variable up front (as above) is mostly a matter of readability rather than performance. Even so, doing the load once and naming it keeps the lookups tidy.

Next

The join above re-scans the code list on every lookup. For large cross-references that does not scale, and the next page fixes it: Keys and indexed lookupxsl:key and key() build the index once, turning the same document() join into a constant-time lookup.

That rounds out the core of XSLT 1.0: templates and apply-templates, named templates and parameters, variables, control flow, XPath with predicates, string handling, output control, sorting, and joining in external data with document(). With these — plus keys next — you can express the great majority of everyday transformations.