home: http://starling.us/gus_netbsd
copyright 2007 by Ĝan Ŭesli Starling
This is a really simple how-to. It may look like a lot, but not to worry. You won’t be needing most of it, not unless you want to be sort of fancy. About half details extra frills which you may not much care about. And some of what remains describes what not to do in very exacting detail. Feel free to either skip or skim through those. Let the Table of Contents
be your guide.
My history on this topic is as follows: I first got interested in the idea of news feeds late in 2003. It seemed a really cool idea, very modern and state-of-the-art for Internet trends. The hot protocol then was RSS
. And my interest is evidenced by the third photo in Link where I am happily displaying one of my gifts of the season...the O’Reilly manual for RSS
wherein I learnt of the then-warring camps among newsfeed proponents...all those competing protocols. So I put it off for some years.
Now is time and I have elected to go with the Atom 1.0
protocol on account of its being supported by the IETF
Here for your casual study is the format of my own Atom
feed. It is pretty simple, as you can see.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><?xml-stylesheet type="text/xsl" href="http://your_own_domain.net/your_xslt.xsl"?> <feed xmlns="http://www.w3.org/2005/Atom"> <id>http://your_own_domain.net/atom.xml </id> <title>Title of Your Atom Feed </title> <updated>2007-01-01T00:00:02Z </updated> <link rel="self" href="http://your_own_domain.net/atom.xml " type="application/atom+xml" /> <author> <name>Your Own Name </name> <uri>http://your_own_domain.net </uri> <email>you@wherever.com </email> </author> <entry> <title>Some Title </title> <category term="Some Category "/> <id>http://your_own_domain.net/some/path/whatever.html </id> <published>2007-01-01T00:00:00Z </published> <updated>2007-01-01T00:00:00Z </updated> <link href="http://your_own_domain.net/some/path/whatever.html "/> <summary>Some few details here. </summary> <content>A somewhat more detailed, but still none-too-lengthy excerpt from the linked-to document here. </content> </entry> </feed>
Note in particular the entry
node, that is to say the <entry>
and </entry>
tags plus everything between. A real feed has more than just one set of those, usually quite a few in fact. Entries are the main component of any feed.
Now take note of the colored parts. Those parts colored Atom
feed will be invalid. Some of the others, those colored
Those parts which I show in
Unicode
, unless you change the encoding
attribute in the very top-most node. That is what the encoding="UTF–8"
stands for, Unicode Text Format, 8-bit
. Not to worry, though. Know that Unicode
is a super-set of plain ASCII
. So if you edit your Atom
feed in a plain text editor you will generally be okay. Avoid word-processors like the plague. Your average word-processor will embed all kinds of invisible markers which are pure gobble-de-gook to any other program except itself. So on UNIX
use Gedit
or VI
or some such. On Win32
use TextPad
or Notepad
or such like.<id>
node must be both unique and permanent. That goes for the <id>
of the feed itself and the <id>
's of each individual <entry>
node.
URL
since that will be unique.<entry>
’s individually embeded <id>
ever. Not even if later you change the <link>
which is pointed to.URL
’s are at about as permanent as I am myself. And beyond that I shall not worry.URL
for all your <id>
nodes. They are, however, more complicated, requiring detailed explanation. So those I’ll go into separately further down this same page.<updated>
node must likewise be unique. Those each must represent a unique (to this Atom
document) date and time.
Atom
validator will rightly refuse to believe that you are capable of updating two documents exactly at the same time down to the very second.CCYY-MM-DD
, such that the birthday of the USA would be on 1776-07-04
or the attack on Pearl Harbor on 1941-12-07
or Y2K at midnight on 2000-01-01
. You get the idea.T
delimits between date and time. The Z
stands for Zulu
time in US military parlance...and Zed
time to a Brit or an Aussie. To short wave radio listeners and ham radio operators GMT
UTC
XML
we must denote it with the Z
.Atom
protocol does allow for time offsets. But why add to the confusion?UNIX
system, just do date -u
on the CLI
to get UTC
time. I expect that Win32
must have a similar function.Those parts which I show in
<?xml-stylesheet ... ?/>
node is optional. You only need it if you have an XSLT
or CSS
stylesheet for governing how your Atom
feed should appear if viewed directly using a web browser such as Firefox
or MSIE
. Stylesheets are optional because Atom/RSS
feeds are usually viewed with feed-readers, not web browsers. But I like to cover all bases. So I wrote my own XSLT
stylesheet.<link ... >
node should contain a link URL
to the file you are linking to. Both summary and content nodes are obvious enough.Here is a case of what can happen when you elect to do job too completely. Below I give you a more-than-valid Atom
feed. Compare it to the one above and you will find but a single difference. This one is better, too good, in fact. The feed below is too exacting. How so? It is over-compliant with the published XML
standards for having made proper use of XML namespaces
.
Know that such use of namespaces is not only perfectly valid but otherwise generally encouraged. This being the case, all Atom
protocol validating engines will proclaim the Atom
feed shown below to be perfectly valid. At least one other Atom
tutorial will exemplify this practice.
But, alas, there exists a big problem. The authors of both Firefox 2.0
and MSIE 7
browsers seem not to have read the XML
standards Atom
and RSS
are minor subsets)Atom
tutorial mentions this quirk. So it took me a couple of days to puzzle it out on my own.
In short, however perfectly correct the example which fillows might otherwise be... Do
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><?xml-stylesheet type="text/xsl" href="http://your_own_domain.net/your_xslt.xsl"?> <atom: feed xmlns:atom ="http://www.w3.org/2005/Atom"> <atom: id>http://your_own_domain.net/atom.xml </atom: id> <atom: title>Title of Your Atom Feed </atom: title> <atom: published>2007-01-01T00:00:02Z </atom: published> <atom: updated>2007-01-01T00:00:02Z </atom: updated> <atom: link rel="self" href="http://your_own_domain.net/atom.xml " type="application/atom+xml" /> <atom: author> <atom: name>Your Own Name </atom: name> <atom: uri>http://your_own_domain.net </atom: uri> <atom: email>you@wherever.com </atom: email> </atom: author> <atom: entry> <atom: title>Some Title </atom: title> <atom: category term="Some Category "/> <atom: id>http://your_own_domain.net/some/path/whatever.html </atom: id> <atom: updated>2007-01-01T00:00:00Z </atom:updated> <atom: link href="http://your_own_domain.net/some/path/whatever.html "/> <atom: summary>Some few details here. </atom: summary> <atom: content>A somewhat more detailed, but still none-too-lengthy excerpt from the linked-to document here. </atom: content> </atom: entry> </atom: feed>
In the top-most Atom
feed example here presented, I employed permanent URL
’s as <id>
nodes. That could easliy work for me because I own my domains with every intention of maintaining them in perpetuity. But what if I did not? How then to insure that my <id>
tags shall always remain unique? This can be done by following a different procedure.
Suppose, for instance, I wanted to compose a guaranateed-unique <id>
node for a link pointing to the anchor tag <a id="sublink_3">
somewhere within an index.html
file. I would begin with said linked-to URL
and modify it slightly like so...
http:// your_own_domain.net/whatever/index.html#sublink_3
becomes your_own_domain.net/whatever/index.html#sublink_3
your_own_domain.net/whatever/index.html# sublink_3
becomes your_own_domain.net/whatever/index.html/ sublink_3
your_own_domain.net/whatever/index.html/sublink_3
becomes your_own_domain.net,2007-02-19: /whatever/index.html/sublink_3
your_own_domain.net,2007-02-19:/whatever/index.html/sublink_3
becomes tag: your_own_domain.net,2007-02-19:/whatever/index.html/sublink_3
And that would be that. The Atom
feed would then have had <id>
nodes which looked like these...
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><?xml-stylesheet type="text/xsl" href="http://your_own_domain.net/your_xslt.xsl"?> <feed xmlns="http://www.w3.org/2005/Atom"> <id>tag:your_own_domain.net,2007-01-01:/atom.xml </id> <title>Title of Your Atom Feed </title> <updated>2007-01-01T00:00:02Z </updated> <link rel="self" href="http://your_own_domain.net/atom.xml " type="application/atom+xml" /> <author> <name>Your Own Name </name> <uri>http://your_own_domain.net </uri> <email>you@wherever.com </email> </author> <entry> <title>Some Title </title> <category term="Some Category "/> <id>tag:your_own_domain.net,2007-01-01:/some/path/index.html/item_link </id> <published>2007-01-01T00:00:00Z </published> <updated>2007-01-01T00:00:00Z </updated> <link href="http://your_own_domain.net/some/path/index.html#item_link "/> <summary>Some few details here. </summary> <content>A somewhat more detailed, but still none-too-lengthy excerpt from the linked-to document here. </content> </entry> </feed>
Unless the article you link to is nested within a much larger web page, there will not likely be an inside-the-page, item link such as #foo
or #bar
. But just possibly you might. So I elected to show how to deal with that.
As with XHTML
and all the XML
family, it is a good idea to validate your Atom
document. You can do that, sort-of, by using the File Open
feature of any modern web browser. An even better, more informative way, is to employ an on-line validator.
Feed Validator:
Link
For Atom
and older RSS
feeds alike.
Why bother with an, admittedly optional, XSLT
stylesheet? To save yourself time and effort is one reason. To aid your readers is another. By way of simple demonstration know that you are viewing an XML/XSLT
document now. This little howto was very easy to write. It will be even easier to maintain. It is easy because the Table of Contents and all of the button links wrote themselves. Likewise all of the horizontal rules, colorizing, etc. I have a howto for that process also: Link.
Know also that the Atom
protocol is based on XML
so both XSLT
and CSS
stylesheets will work with an Atom
feed. No way at all is a stylesheet required, though. Most people read Atom
documents in a feed-reader, either a stand-alone utility, or one built into their favorite browser. So stylesheets are just a frill, some eye-candy for the benefit of whoever might (for whatever reason) view your feed with an ordinary web browser...at least the way I write them. You could also target PDF readers, or whatever other format, using an XSLT
stylesheet.
In my own XSLT
and CSS
stylesheets I only ever target web browsers. It is really quite simple to use both together, embedding a little bit of CSS
within any given XSLT
stylesheet. I have two of those: one for plain, browser-destined, XML
; and yet another for Atom
feeds. See mine below. Feel free to steal from either of them if it please you. If you use them mostly whole, please retain my copyright.
Below are links to my XSLT
stylesheets and Atom
feeds employing such to facilitate browser-viewing.
You will note that I named this Atom
feed using the *.xml
extension rather than the *.atom
extension. I did this because some experimentation revealed that Firefox
would not display an *.atom
file no matter what MIME
type Apache
served it as. I also read elsewhere that this is a known Firefox
quirk.
Enhancements à la XSLT:
Link
XSL — For index.xml
docs generally.
Link
XSL — For atom.xml
docs specifically.
Link
Atomic Starling — An XSLT-enhanced Atom
feed
As handy as Atom
and RSS
feeds can be when viewed by a custom reader, when viewed in an ordinary web browser their static quality soon pales. Rather then just let them lie there, all static and rather dead looking, let us make make them re-sortable.
I knew that this would be very easy to do using Perl/CGI
. Nevertheless I still beat my head against the wall for most of a week trying to make a go of it without resorting to CGI
. First I tried to do it in pure XSLT
. Utter futility. Browsers do not yet support xsl-import
in their XSLT
engines. Yet another annoying quirk. Alas, and alack.
So next I tried to marry XSLT
and JavaScript
. Googling around I dug up one post where someone else managed this, after more than a month of their own trial-and-error. But it requried browser-sniffing and accomodating competing DOM
’s...to which I am fiercely allergic.
After some few days of my own discouraging trial-and-error, I gave up in renewed utter disgust for JavaScript
and fell back to trusty Perl/CGI
. Here is the result. As always, I release it under the Perl artistic license. Do whatever you like with it.
Enhancements à la Perl/CGI:
Link
Perl/CGI — My /cgi-bin/gus_atom_xsl.pl
script.
Link
Atomic Starling — Same Atom
feed, but provided with a re-sort button.
The browsers too have gotten into displaying news feeds. But the way they have done it is both artless and heavy handed. Prior to versions MSIE 7
and Firefox 2.0
most things XSLT
-ish worked as they ought. But in those latter versions that pair have turned artless with respect to both Atom
and RSS
in failing to check if said feed may not already have a stylesheet and heavy handed in over-riding them with internal defaults.
Complaints have been issued. And I expect they will be addressed. But we shall have to await future updates before that should happen. In the meantime the best recourse envolves a cheap trick to bypass this annoying misfeature. Both MSIE 7
and Firefox 2.0
give up trying to sniff out a feed protocol after reading the first 512 bytes of an XML
file.
Being now informed of this trip-limit, feed authors employing XSLT
are editing their XML
to begin with a 512-byte comment. This will often consist of an anti-Firefox/MSIE rant on the problem. I employ an excerpt in Esperanto translation from the first chapter of a Polish novel about ancient Egypt.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <?xml-stylesheet type="text/xsl" href="/gus_atom.xsl"?><!-- La Faraono verkis Boleslaw Prus tadukis Kabe Ĉapitro Unu En la tridektria jaro de la feliĉa regado de Ramses XII, Egipto festis du solenojn, kiuj plenigis ĝiajn ortodoksajn loĝantojn per fiero kaj feliĉo. En la monato Meĥir, en decembro, revenis Tebojn, plen kovrita per multekostaj donacoj, la dio Ĥongu, kiu dum tri jaroj kaj naŭ monatoj vojaĝis en la lando Buĥten, resanigis tie la reĝan filinon Bent-res kaj forpelis la malbonan spiriton ne nur el la reĝa familio, sed eĉ el la citadelo de Buĥten. Nu, jen! Sufiĉas por 512 bitoj. Lasu ke oni vidu mian XSLT-on. --> <feed xmlns="http://www.w3.org/2005/Atom">
As soon as your feed will validate, you’ll want for folks to start reading it. The way to make that easy for them is called auto-discovery. What this means is that should someone visit your regular web page, their browser will automatically discover that an Atom
feed exists in relation to that ordinary web page. A little orange feed-icon will appear in said browser’s URL
window. Your visitor clicks on said icon and is granted opportunity to subscribe to your feed. It is all very easy.
For auto-discovery to work, you must embed a new tag in the <head>
of the HTML
code for that ordinary web page. Such a tag will look like this...
<head> <link rel="alternate " type="application/atom+xml " href="http://your_own_domain.net/atom.xml " title="Name of Your Feed " /> </head>
I broke that tag into separate lines by attribute for clarity. Make yours into a single line if you prefer. Those attributes with
In fact, with the type="application/atom+xml"
attribute in your auto-discovery tag, it won’t even matter whether you have an AddType
for *.atom
at all in the httpd.conf
of your Apache 2
webserver. I found that out while running experiments to puzzle out that infuriating XML namespace
browser quirk.
Note, that for a stylesheet to have effect, your web server must be prepared to serve both the doc and the stylesheet properly. For that to happen, on an Apache
web-server at any rate, you need to have entries like these in your httpd.conf
configuration file. I just tack them on at the very end.
# Custom changes by me DirectoryIndex index.html index.xml AddType text/xml *.xml AddType text/xml *.xsl AddType application/atom+xml *.atom
Those are from my own Apache 2.0
web-server. A commercial outfit should already have those set up. But you may, just possibly, have need to remind them. They serve several purposes...
DirectoryIndex
line says to serve index.xml
if index.html
is absent in a directory. That allows you to use a URL
like just plain http://foobar
instead of http://foobar/index.xml
in links. It is neater.AddType
lines allow the server to inform browsers what to do with certain kinds of files. Without them browsers may not always recognize what they are getting. Viewers might see a query from their browser instead of the page. Such query would ask, Open with... or Download? when trying to view the feed, XML
doc, or whatever.Here are some further information web-links to help get you started publishing your own Atom
feeds and/or reading the feeds of others.
General info on Atom 1.0 feeds:
Link
AtomEnabled.org
Link
Google spurns RSS
More on authoring Atom 1.0 feeds:
Link
Another authoring howto.
Link
Yet another authoring howto.
Link
Howto for JavaScript & XSLT
Readers for Atom 1.0 feeds
Link
SimplePie — An on-line feed reader.
Link
Mozilla Thunderbird — Subscribing to feeds.
Link
Google on feed readers.
Link
Yahoo on feed readers.
E-Mail This page is composed in XML format using a plain ASCII text editor. I last revised it on 2007-02-25 at 13:37:35 hours UTC testing in the Firefox 1.5.0.1 browser. Please email to report any problems (other than MSIE’s CSS shortcomings).