Functions and data types¶
A location path always evaluates to a node-set, but XPath also computes. Every expression produces a value of one of four types, and the core function library turns those values into one another — counting nodes, slicing strings, adding prices, asking yes/no questions. This page is the catalogue of those types, their conversion rules, and the functions every 1.0 processor provides.
The running document is the familiar catalog, where each cd carries a genre
attribute and the first keeps an optional year:
The four data types¶
XPath 1.0 has exactly four types. Everything an expression returns is one of them:
| Type | What it holds | Example expression |
|---|---|---|
| node-set | a set of nodes | catalog/cd, //title |
| string | a sequence of characters | name(/catalog), 'rock' |
| number | an IEEE 754 double | count(//cd), 3.14, price * 2 |
| boolean | true or false | @genre = 'rock', not(year) |
No integer type
A 1.0 number is always a double-precision float — there is no separate
integer type. count(//cd) is the double 3, and division never truncates:
7 div 2 is 3.5, not 3. The special values NaN (not-a-number, e.g.
number('rock')) and positive/negative infinity are numbers too.
The interesting part is what happens when a value of one type is used where another is expected. XPath converts implicitly, and the rules — especially for node-sets — are where most surprises live.
A node-set as a string¶
When a node-set is used where a string is wanted, XPath takes the string value
of the first node in document order and ignores the rest. So string(//title)
is not all three titles — it is just the first:
Empire Burlesque
The string value of an element is all of its text descendants concatenated;
the string value of an attribute is its value. An empty node-set converts to the
empty string "".
A value in boolean context¶
In a predicate, an and/or, or inside boolean(), a value is coerced to
true/false:
- A node-set is true when it is non-empty, false when empty.
- A string is true when it is non-empty (
""is false). - A number is true unless it is
0orNaN. - A boolean is itself.
So cd[year] keeps the CDs that have a year child (the node-set is
non-empty), and cd[@genre] keeps those carrying a genre attribute:
Empire Burlesque … (only the first CD has a <year>)
Comparing a node-set with =¶
Comparison is where the node-set rules bite hardest. When either side of a comparison is a node-set, the operator is existential: it is true if any node in the set satisfies the comparison.
true — because at least one price equals 9.90 (two of them do)
Node-set = means 'any node matches'
catalog/cd/price = '9.90' does not ask "are all prices 9.90?" — it asks
"is there some price equal to 9.90?". It is true here even though one CD
costs 10.90.
The consequence that catches everyone: != is not the negation of =.
catalog/cd/price != '9.90' is also true, because some price (10.90)
differs from 9.90. To negate a = test, wrap it: not(price = '9.90').
When one side is a node-set, the value is compared against each node, and a
single match anywhere makes the whole expression true.
Operators¶
Comparison¶
=, !=, <, <=, >, >=. Numeric comparisons (< and friends) convert
both sides to numbers first, so price > 10 works on the text content directly.
Escape < inside XML attributes
Because XPath usually lives inside an XML attribute (an XSLT select,
match, or test), a literal < would break the XML. Write it as <:
test="price < 10". > is legal but > is conventional. See
XSLT predicates.
Arithmetic¶
+, -, *, and — note the spelling — div and mod, not / and %.
The slash is reserved for location-path steps and % has no meaning in XPath,
so division and remainder use word operators:
- The first matched
pricedoubled. 2.5— division is floating-point.1— the remainder.
Boolean¶
and and or combine boolean values (there is no && / ||). or is
short-circuit-friendly and lower precedence than and:
The core function library¶
XPath 1.0 ships a fixed set of functions, conventionally grouped by the type
they work on. Signatures below use node-set? for an optional argument that
defaults to the context node.
Node-set functions¶
| Function | Returns |
|---|---|
last() |
size of the current node-set (number) |
position() |
position of the context node (number) |
count(node-set) |
number of nodes in the set |
name(node-set?) |
the qualified name (e.g. cd) of the first node |
local-name(node-set?) |
the local part of that name, without prefix |
namespace-uri(node-set?) |
the namespace URI of the first node |
id(object) |
element(s) with the given ID-typed attribute |
count(catalog/cd) (1)
count(catalog/cd[@genre = 'rock']) (2)
name(/catalog/cd) (3)
catalog/cd[position() = last()]/title (4)
3— every CD.1— only the rock CD.cd— the name of the first matched element.Greatest Hits— the title of the last CD.
count(catalog/cd) → 3
count(catalog/cd[@genre = 'rock']) → 1
name(/catalog/cd) → cd
catalog/cd[position() = last()]/title → Greatest Hits
String functions¶
| Function | Returns |
|---|---|
string(object?) |
the object converted to a string |
concat(s1, s2, …) |
the arguments joined (two or more) |
contains(haystack, needle) |
boolean: does haystack contain needle? |
starts-with(s, prefix) |
boolean: does s begin with prefix? |
substring(s, start, len?) |
a slice; positions are 1-based |
substring-before(s, sep) |
the part of s before the first sep |
substring-after(s, sep) |
the part of s after the first sep |
string-length(s?) |
number of characters |
normalize-space(s?) |
trims ends and collapses internal whitespace |
translate(s, from, to) |
per-character replacement |
Empire Burlesque — Bob Dylan(string value of the first CD's children).10— the whole-currency part of the first price.truefor the rock CD.
concat(title, ' — ', artist) → Empire Burlesque — Bob Dylan
substring-before(price, '.') → 10
contains(@genre, 'rock') → true
These are taught in depth on the XSLT side
The XSLT Tutorial works through the same string functions with fuller examples. See String functions rather than re-learning them here.
Number functions¶
| Function | Returns |
|---|---|
number(object?) |
the object converted to a number |
sum(node-set) |
the sum of the string-values-as-numbers of every node |
round(x) |
x rounded to the nearest integer (.5 rounds up) |
floor(x) |
the largest integer not greater than x |
ceiling(x) |
the smallest integer not less than x |
sum is the one that surprises: it walks the whole node-set (unlike the
first-node string rule), converting each node's string value to a number:
30.7—10.90 + 9.90 + 9.90.10— the average price, rounded.
sum(catalog/cd/price) → 30.7
round(sum(catalog/cd/price) div count(catalog/cd)) → 10
Boolean functions¶
| Function | Returns |
|---|---|
boolean(object) |
the object converted to a boolean |
not(boolean) |
the logical negation |
true() |
the constant true |
false() |
the constant false |
lang(string) |
true if the context node's xml:lang matches |
There are no boolean literals in XPath 1.0 — you call true() and false().
not() is the correct way to negate a node-set test, sidestepping the !=
trap above:
false— there is a 9.90 price, so the negation is false.true— at least one CD has ayear, so the node-set is non-empty.
not(catalog/cd/price = '9.90') → false
boolean(catalog/cd[year]) → true
At a glance¶
| Group | Functions |
|---|---|
| Node-set | last, position, count, name, local-name, namespace-uri, id |
| String | string, concat, contains, starts-with, substring, substring-before, substring-after, string-length, normalize-space, translate |
| Number | number, sum, round, floor, ceiling |
| Boolean | boolean, not, true, false, lang |
This is the entire 1.0 library — there is nothing else. If you need a function that is not in this table, you are reaching for 2.0.
A note on 2.0 and 3.0¶
XPath 2.0 and 3.0 rework the foundations:
- The four types give way to a rich type system over sequences, tied to XML Schema. A sequence is ordered and may mix atomic values and nodes, and a single item is just a sequence of length one.
- The function library grows enormously: regular expressions
(
matches,replace,tokenize), full date and time arithmetic,string-join,upper-case/lower-case,min,max,avg,distinct-values, and many more. - You can define your own functions and use
for,if,some, andeveryexpressions inline.
These need a 2.0+ processor — the same one behind XSLT 2.0/3.0. The XSLT Tutorial covers the territory: Moving to XSLT 2.0 and 3.0, Regular expressions and strings, and User-defined functions.
Where next¶
You now know XPath 1.0 end to end: the tree of nodes, how location paths and axes walk it, how node tests and predicates filter each step, and — on this page — the four data types, their conversion rules, and the core function library that computes over them.
XPath is an addressing language; to do something with what it selects you need
a host. The XSLT Tutorial is where this same XPath is put to
work transforming documents — every select, match, and test there is an
expression of exactly the kind you have just learned. When you outgrow the four
types, Moving to XSLT 2.0 and 3.0 opens the sequence
model and the larger function library.
Back to the section Overview.