Atom and feed extensions — extending a base vocabulary¶
The previous pages showed big vocabularies. Atom is the opposite: a small,
fixed core for syndication (feed, entry, title, link, …) that is designed
to be extended. This is the namespace pattern that powers podcasts, blog
metadata, and most of the structured web's "and also attach this" — adding
foreign elements into a document without changing the base vocabulary or
breaking readers that do not understand them.
A feed with two extensions¶
- Atom's
linkis typed byrel:alternateis the human page,selfis the feed's own URL. The whole feed is in the default Atom namespace — no prefix onfeed,title,entry. - Every Atom
idmust be a permanent, globally-unique IRI — here aurn:uuid. Readers de-duplicate entries by this, not by title. itunes:authoris from Apple's podcast namespace. An Atom reader that has never heard of podcasts simply ignores it — it is in a namespace the reader does not process. That graceful ignoring is the entire point.dc:creatoris Dublin Core, a tiny, decades-old metadata vocabulary (creator,date,subject,rights, …) that shows up everywhere — feeds, Office documents, repositories. It is the canonical "borrowed" namespace.content type="html"carries escaped HTML as text. (It could instead betype="xhtml"and contain real XHTML elements — yet another namespace nested inside.)
The layering is the whole story: bare names (title, entry, link) are core
Atom; the dc: and itunes: prefixes are bolt-ons. A reader walks the tree and
skips any prefix it does not recognize.
Why this is "extension", not "a bigger schema"¶
The crucial property: the Atom schema never changed. Apple did not get to edit the Atom spec to add podcast tags; they minted their own namespace and Atom's content model allows foreign-namespaced children. In XSD terms, the extension point is a wildcard:
##othermeans "elements from any namespace except the target (Atom) one";*is zero-or-more;laxsays "validate them if you have a schema for that namespace, otherwise let them pass". This single wildcard is what turns a closed vocabulary into an open, extensible one. Compare it with the validation pipeline: same idea, opposite policy — EN16931 forbids extensions where Atom invites them.
RSS: the same job, no namespace on the core¶
Atom's older cousin, RSS 2.0, makes an instructive contrast: its core
elements (channel, item, title) are in no namespace at all — a design
from before namespaces were universal. Yet RSS still grew the same extension
ecosystem, purely through namespaced add-ons:
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:podcast="https://podcastindex.org/namespace/1.0">
<channel>
<title>No namespace here</title> <!-- core RSS: unqualified -->
<item>
<title>An episode</title>
<content:encoded><![CDATA[<p>Full HTML body</p>]]></content:encoded>
<podcast:transcript url="t.vtt" type="text/vtt"/>
</item>
</channel>
</rss>
So a single RSS document mixes unqualified core elements with qualified extension elements — a living reminder that "no namespace" is itself a namespace state, and that you can layer qualified vocabularies on top of an unqualified base.
Atom vs RSS in one line
Atom puts its whole core in a namespace and versions by URI (like SOAP); RSS leaves its core unqualified and bolts extensions on. Both end up extensible — the route differs.
Querying a feed — the default-namespace trap¶
That "whole core in a namespace" choice has a sharp practical edge, and it is the
single thing that trips people parsing Atom for the first time. The feed declares
its namespace with no prefix (xmlns="http://www.w3.org/2005/Atom"), so it is
tempting to read <title> as if it were unqualified. It is not — it is in the Atom
namespace. The naïve query finds nothing:
//title -> (no matches) # "title in NO namespace" — there are none
//entry/title -> (no matches) # same trap, one level down
This is the exact XPath 1.0 rule met on the
SVG and
Office pages: an unprefixed name in a
query means no namespace, and a default xmlns is still a namespace. You
must bind a prefix of your own — the document has none to copy — and use it:
your prefix map: a -> http://www.w3.org/2005/Atom
your query: //a:entry/a:title -> matches every entry title
The prefix a is yours; only the URI matters. RSS 2.0 inverts the surprise: its
core is unqualified, so //item/title works directly — but the moment you reach
for <content:encoded> you are back to binding content to its namespace URI. The
rule is uniform; which elements it bites just depends on where each format drew the
namespace line.
Things to note¶
- A small, stable core plus a wildcard extension point beats one ever-growing schema — third parties extend on their own namespaces and timelines.
- Unknown namespaces are safely ignored, which is what makes the web's feeds forward-compatible.
- Dublin Core is the archetypal borrowed vocabulary, and a common one to meet.
- "No namespace" (RSS core) and "everything namespaced" (Atom core) are both workable starting points.
Next: XSL-FO and Apache FOP — a vocabulary you almost never type by hand, because XSLT generates it.