Reading and writing non-XML text¶
A persistent myth is that XSLT only reads and writes XML. Since 2.0 it has been a capable text-processing language too: it can pull in a plain-text file, a CSV, or a JSON document, turn a string into a navigable tree, and write a tree back out as a string or a text file. Combined with regular expressions, this makes XSLT a real option for format-conversion jobs that have nothing to do with XML at either end.
This page covers the four corners:
| Direction | Resource (a file/URI) | A string in memory |
|---|---|---|
| In | unparsed-text / unparsed-text-lines |
parse-xml / parse-json |
| Out | xsl:result-document method="text" |
serialize |
Reading text: unparsed-text¶
unparsed-text($href) returns the entire contents of a resource as a single
xs:string — no XML parsing, no markup interpretation. unparsed-text-lines
returns it pre-split into a sequence of lines, which is almost always what you
want:
<xsl:variable name="raw" select="unparsed-text('notes.txt')"/>
<xsl:variable name="lines" select="unparsed-text-lines('data.csv')"/>
<xsl:value-of select="count($lines)"/> <!-- number of lines -->
The href is resolved against the stylesheet's base URI; an optional second
argument names the encoding (unparsed-text('x.txt', 'ISO-8859-1')) when it is
not UTF-8 and not declared by the resource.
Guard with unparsed-text-available
Reading a missing file is a recoverable error. Test first when the input is optional:
Worked example: CSV → XML¶
The canonical text job. Read the lines, drop the header, tokenize
each row on commas, and build elements — the catalog you have been transforming
all along, but arriving as CSV:
genre,title,artist,price
rock,Empire Burlesque,Bob Dylan,10.90
pop,Hide your heart,Bonnie Tyler,9.90
<xsl:template name="xsl:initial-template">
<catalog>
<xsl:for-each select="tail(unparsed-text-lines('catalog.csv'))"> <!-- (1)! -->
<xsl:variable name="f" select="tokenize(., ',')"/> <!-- (2)! -->
<cd genre="{$f[1]}">
<title>{$f[2]}</title>
<artist>{$f[3]}</artist>
<price>{$f[4]}</price>
</cd>
</xsl:for-each>
</catalog>
</xsl:template>
taildrops the header line;xsl:initial-templateis the 3.0 entry point when there is no source document to match.tokenizesplits each line into fields. Real CSV with quoted commas needsxsl:analyze-stringinstead — but for clean data this is enough.
(This assumes text value templates are on with
expand-text="yes", hence the bare {$f[2]}.)
Parsing a string into a tree: parse-xml¶
When the XML arrives as a string — embedded in a database column, returned by
an API, or read with unparsed-text — parse-xml($string) (3.0) turns it into a
document node you can navigate normally:
<xsl:variable name="doc" select="parse-xml('<cd><title>X</title></cd>')"/>
<xsl:value-of select="$doc/cd/title"/> <!-- X -->
parse-xml-fragment is the variant for a fragment — content without a single
document element, such as <a/><b/> or a run of mixed text and elements.
For JSON the equivalent is parse-json (string → maps/arrays) and
json-doc($href), which reads JSON straight from a file in one step — the JSON
sibling of unparsed-text + parse-json.
Writing a tree as a string: serialize¶
The inverse of parse-xml. serialize($node) (3.0) renders a node — or any
sequence — back into markup as a string, so you can embed it in text output,
store it, or post-process it:
<xsl:value-of select="serialize(/catalog/cd[1])"/>
<!-- <cd genre="rock"><title>Empire Burlesque</title>… -->
A second argument controls the output — either a map or an
output:serialization-parameters element:
This is how you produce escaped markup (XML-inside-text), or serialise to JSON
with map { 'method': 'json' }.
Writing text files: method="text"¶
To emit a non-XML file, combine xsl:result-document
with method="text" — every node is flattened to its string value, no tags, no
escaping of < and & beyond what text needs.
Worked example: XML → CSV¶
The round trip back out. Note xsl:text and to place the commas and
newlines exactly:
<xsl:template match="/catalog">
<xsl:result-document href="out.csv" method="text">
<xsl:text>genre,title,artist,price </xsl:text>
<xsl:for-each select="cd">
<xsl:value-of select="(@genre, title, artist, price)" separator=","/>
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:result-document>
</xsl:template>
xsl:value-of with separator="," joins the four fields in one step — the
string-join idiom built into the instruction.
Character maps and escaping
method="text" does no markup escaping, so the output is literally the
string values. If you need to substitute particular characters on the way
out (say, replace a delimiter that appears in the data), an
xsl:character-map referenced from xsl:output does it at serialization
time. For structural CSV quoting, escape inside the transform with
replace before output.
Where to go next¶
- Regular expressions and strings —
tokenize/analyze-string, the parsing half of every text job. - Reading and writing JSON —
parse-json,json-doc, andserializewithmethod="json". - Multiple result documents — writing many text or XML files in one run.
- New functions —
head/tail,string-join, and the rest used above.