Table of Contents
useHTTPClient:
Forcing DOMIT! RSS to use an HTTP ClientsetRSSTimeout
: Setting a timeout for obtaining feed datasetConnection
: Manually specifying HTTP connection parameterssetAuthorization
: Using basic HTTP authorization with your connectionsetProxyConnection
: Retrieving XML data through a proxy serversetProxyAuthorization
: Using basic HTTP authorization with your proxyRSS, -- variously known as Real Simple Syndication, RDF SIte Summary, or Rich Site Summary -- is an XML-based web syndication format originally developed by Netscape. It allows you to:
create lists of online content
describe the content
link to the content
RSS files, also know as feeds, are placed on a static URL where users can subscribe using an application called an RSS Reader or RSS Aggregator. These applications periodically query the URL for updated content and present it to the user in a readable format.
RSS is widely used by news organizations, who use it to publish daily lists of articles. Blogger articles and web site updates are also commonly summarized in RSS format.
There have been a number of versions of RSS over its lifetime. All versions, however, share a common set of core features.
The following is a sample feed from the BBC news site, posted at the URL http://www.bbc.co.uk/syndication/feeds/news/ukfs_news/technology/rss091.xml
<?xml version="1.0" encoding="ISO-8859-1" ?> <rss version="2.0"> <channel> <title>BBC News | Technology | UK Edition</title> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link> <description>Updated every minute of every day</description> <language>en-gb</language> <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate> <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright> <docs>http://www.bbc.co.uk/syndication/</docs> <ttl>15</ttl> <image> <title>BBC News</title> <url>http://news.bbc.co.uk/nol/shared/img/bbc_news_120x60.gif</url> <link>http://news.bbc.co.uk</link> </image> <item> <title>GTA sex scandal hits Australia</title> <description>Grand Theft Auto: San Andreas has effectively been banned in Australia because of secret sex scenes.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4728261.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4728261.stm</guid> <pubDate>Fri, 29 Jul 05 13:20:35 GMT</pubDate> </item> <item> <title>FBI holds eight on piracy charge</title> <description>The US authorities have charged eight people with the illegal trading of copyrighted material over the net.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4727919.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4727919.stm</guid> <pubDate>Fri, 29 Jul 05 12:14:24 GMT</pubDate> </item> <item> <title>Spacewalk to test shuttle repair</title> <description>Astronauts on space shuttle Discovery are getting ready to carry out the mission's first spacewalk.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/sci/tech/4730129.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/sci/tech/4730129.stm</guid> <pubDate>Sat, 30 Jul 05 03:26:17 GMT</pubDate> </item> <item> <title>Cisco curbs security researcher</title> <description>A security researcher has agreed never to talk about flaws in Cisco software that controls internet routers.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4727021.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4727021.stm</guid> <pubDate>Fri, 29 Jul 05 09:14:00 GMT</pubDate> </item> <item> <title>Net addresses come to Earth</title> <description>Net addresses are starting to reveal how they are linked to the real world.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4665351.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4665351.stm</guid> <pubDate>Fri, 29 Jul 05 08:07:17 GMT</pubDate> </item> <item> <title>Tiny customers 'won't get money'</title> <description>Customers who paid for undelivered orders of Tiny and Time PCs are "unlikely" to get their money back.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/business/4727143.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/business/4727143.stm</guid> <pubDate>Fri, 29 Jul 05 12:04:34 GMT</pubDate> </item> <item> <title>Teens spurn e-mail for messaging</title> <description>Instant messaging, rather than e-mail, is the preferred way for US teenagers to stay in touch, research shows.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4719083.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4719083.stm</guid> <pubDate>Thu, 28 Jul 05 10:29:13 GMT</pubDate> </item> <item> <title>Fake Tube safety e-mail spreads</title> <description>Mobile users are warned about an e-mail which claims to have safety information about calling from the Tube.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4724101.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4724101.stm</guid> <pubDate>Thu, 28 Jul 05 11:26:48 GMT</pubDate> </item> <item> <title>Digital rights group gets going</title> <description>Net veterans plan to create a UK group that campaigns to protect digital rights and freedoms.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4724089.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4724089.stm</guid> <pubDate>Thu, 28 Jul 05 12:05:20 GMT</pubDate> </item> <item> <title>Hollywood hails digital film deal</title> <description>Movie studios reach a "milestone" deal to allow digital projectors to replace reels of film.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/film/4724335.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/entertainment/film/4724335.stm</guid> <pubDate>Thu, 28 Jul 05 12:40:29 GMT</pubDate> </item> <item> <title>Price falls push Sony into loss</title> <description>Sony falls into the red for the three months to June, hit by fall in prices for televisions and DVD recorders.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/business/4723567.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/business/4723567.stm</guid> <pubDate>Thu, 28 Jul 05 12:52:36 GMT</pubDate> </item> <item> <title>Profits tumble at gamer Nintendo</title> <description>Nintendo's first quarter profits drop as its new DS console fails to plug the gap left by waning GameCube sales. </description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/business/4724083.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/business/4724083.stm</guid> <pubDate>Thu, 28 Jul 05 11:09:54 GMT</pubDate> </item> <item> <title>Awards to applaud women in tech</title> <description>Top women in technology are to be recognised in the first Blackberry Women and Technology awards.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4718703.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4718703.stm</guid> <pubDate>Wed, 27 Jul 05 08:11:31 GMT</pubDate> </item> <item> <title>HP decides to stop selling iPods</title> <description>Hewlett-Packard announces that it is to stop selling HP-branded iPods in line with a change in strategy.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/business/4729907.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/business/4729907.stm</guid> <pubDate>Fri, 29 Jul 05 21:07:03 GMT</pubDate> </item> <item> <title>New services boost profits at BT</title> <description>Telecoms giant BT Group dials up a 21% rise in quarterly pre-tax profits thanks to a "new wave" of revenues.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/business/4723343.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/business/4723343.stm</guid> <pubDate>Thu, 28 Jul 05 08:13:05 GMT</pubDate> </item> <item> <title>Napster launches radio song sales</title> <description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/entertainment/music/4724287.stm</guid> <pubDate>Thu, 28 Jul 05 11:23:13 GMT</pubDate> </item> <item> <title>Downloading 'myths' challenged</title> <description>People who illegally download music spend much more on legal downloads than average fans, a study shows.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4718249.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4718249.stm</guid> <pubDate>Wed, 27 Jul 05 08:10:56 GMT</pubDate> </item> <item> <title>Game over for Tapwave's Zodiac</title> <description>Catch up with the latest news from the world of video gaming.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/2207229.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/2207229.stm</guid> <pubDate>Fri, 29 Jul 05 16:50:25 GMT</pubDate> </item> <item> <title>Animated capers aim to please</title> <description>Reviews of two of the latest games aimed at children.</description> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4696167.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4696167.stm</guid> <pubDate>Tue, 19 Jul 05 11:06:54 GMT</pubDate> </item> </channel> </rss>
In the next sections we will discuss some of the main elements of an RSS feed.
Since an RSS document is also an XML document, each RSS document is required to begin with an XML declaration:
<?xml version="1.0" encoding="ISO-8859-1" ?>
In common practice, this statement is often omitted.
The root element of an RSS document is named rss. The rss element contains a single, mandatory attribute named version, which specifies the version of RSS that the document conforms to.
<rss version='0.94'> ...rss content continues here </rss>
An RSS document is required to contain a single channel element, which is a container for the publication data:
<rss version='2.0'> <channel> ...rss content continues here </channel> </rss>
Note: Occasionally, you may see (non-standard) use of multiple channels.
Each channel is required to include three elements: title, link, and description. They may appear in any order.
The title element contains a short title for the channel.
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
...rss content continues here
</channel>
</rss>
The link element contains the URL of website hosting the feed
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
...rss content continues here
</channel>
</rss>
The description element contains a description of the channel.
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
...rss content continues here
</channel>
</rss>
There are also a number of optional elements available for a channel. They may appear in any order.
The language element describes the language of the feed.
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
...rss content continues here
</channel>
</rss>
Permissible value for the language element are those defined by the W3C, or the following list:
Afrikaans: af Albanian: sq Basque: eu Belarusian: be Bulgarian: bg Catalan: ca Chinese (Simplified): zh-cn Chinese (Traditional): zh-tw Croatian: hr Czech: cs Danish: da Dutch: nl Dutch (Belgium): nl-be Dutch (Netherlands): nl-nl English: en English (Australia): en-au English (Belize): en-bz English (Canada): en-ca English (Ireland): en-ie English (Jamaica): en-jm English (New Zealand): en-nz English (Phillipines): en-ph English (South Africa): en-za English (Trinidad): en-tt English (United Kingdom): en-gb English (United States): en-us English (Zimbabwe): en-zw Estonian: et Faeroese: fo Finnish: fi French: fr French (Belgium): fr-be French (Canada): fr-ca French (France): fr-fr French (Luxembourg): fr-lu French (Monaco): fr-mc French (Switzerland): fr-ch Galician: gl Gaelic: gd German: de German (Austria): de-at German (Germany): de-de German (Liechtenstein): de-li German (Luxembourg): de-lu German (Switzerland): de-ch Greek: el Hawaiian: haw Hungarian: hu Icelandic: is Indonesian: in Irish: ga Italian: it Italian (Italy): it-it Italian (Switzerland): it-ch Japanese: ja Korean: ko Macedonian: mk Norwegian: no Polish: pl Portuguese: pt Portuguese (Brazil): pt-br Portuguese (Portugal): pt-pt Romanian: ro Romanian (Moldova): ro-mo Romanian (Romania): ro-ro Russian: ru Russian (Moldova): ru-mo Russian (Russia): ru-ru Serbian: sr Slovak: sk Slovenian: sl Spanish: es Spanish (Argentina): es-ar Spanish (Bolivia): es-bo Spanish (Chile): es-cl Spanish (Colombia): es-co Spanish (Costa Rica): es-cr Spanish (Dominican Republic): es-do Spanish (Ecuador): es-ec Spanish (El Salvador): es-sv Spanish (Guatemala): es-gt Spanish (Honduras): es-hn Spanish (Mexico): es-mx Spanish (Nicaragua): es-ni Spanish (Panama): es-pa Spanish (Paraguay): es-py Spanish (Peru): es-pe Spanish (Puerto Rico): es-pr Spanish (Spain): es-es Spanish (Uruguay): es-uy Spanish (Venezuela)es-ve Swedish: sv Swedish (Finland): sv-fi Swedish (Sweden): sv-se Turkish: tr Ukranian: uk
The copyright element contains a copyright statement for the RSS content.
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
...rss content continues here
</channel>
</rss>
The managingEditor element contains the email address of the person responsible for editorial content.
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
...rss content continues here
</channel>
</rss>
The webmaster element contains the email address of the person responsible for maintaining the channel technically.
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
<webMaster>jane.doe@bbc.co.uk</webMaster>
...rss content continues here
</channel>
</rss>
The pubDate element contains the publication date/time for the channel. This must conform to the RFC 822 Date and Time Specification (although the year may be expressed in either two or four characters).
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
<webMaster>jane.doe@bbc.co.uk</webMaster>
<pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
...rss content continues here
</channel>
</rss>
The lastBuildDate element contains the last time that the content in the channel changed. This must also conform to the RFC 822 Date and Time Specification.
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
<webMaster>jane.doe@bbc.co.uk</webMaster>
<pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
<lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
...rss content continues here
</channel>
</rss>
The category element contains a forward-slash separated string describing:
a category that the feed belongs to
the position of that category within a taxonomy
An optional attribute, domain, contains an URL pointing to a description of the taxonomy.
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
<webMaster>jane.doe@bbc.co.uk</webMaster>
<pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
<lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
<category domain="http://www.superopendirectory.com/">news/science/technology</category>
...rss content continues here
</channel>
</rss>
In the above example, the feed belongs to a category named technology, which is a subcategory of news and science. A description of the taxonomy can be found on the domain http://www.superopendirectory.com/.
You can include as many categories elements as you like.
The generator element contains the name of the program used to generate the RSS feed:
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
<webMaster>jane.doe@bbc.co.uk</webMaster>
<pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
<lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
<category domain="http://www.superopendirectory.com/">news/science/technology</category>
<generator>RSSKY Feed Generator</generator>
...rss content continues here
</channel>
</rss>
The docs element contains the URL for the documentation of the RSS format of the feed:
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
<webMaster>jane.doe@bbc.co.uk</webMaster>
<pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
<lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
<category domain="http://www.superopendirectory.com/">news/science/technology</category>
<generator>RSSKY Feed Generator</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
...rss content continues here
</channel>
</rss>
The cloud element allows users to register for a web service which will send notification when a feed has been updated.
The web service can be implemented in HTTP-POST, XML-RPC or SOAP 1.1
The cloud element contains five attributes containing parameters required for querying of the web service:
domain - the domain where the web service resides
port - the TCP port the web service is listening on
path - the path, relative to the domain, where the web service resides
registerProcedure - the name of the web service method to be called
protocol - the protocol of the web service, either xml-rpc, or soap
The web service returns true if the subscription is successful. By convention, registrations expire after 25 hours. Users should reregister every 24 hours for each subscription.
The following is an example of the cloud syntax:
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
<webMaster>jane.doe@bbc.co.uk</webMaster>
<pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
<lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
<category domain="http://www.superopendirectory.com/">news/science/technology</category>
<generator>RSSKY Feed Generator</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
...rss content continues here
</channel>
</rss>
Above, an xml-rpc query, calling the method xmlStorageSystem.rssPleaseNotify
, would be sent to xml-rpc.bbc.co.uk/RPC2 on port 80. The user's application would be responsible for understanding the specific detail of how the web service method is to be called.
A good description of how the cloud interface is implemented can be found at Dave Winer's Radio Userland site at: http://backend.userland.com/publishSubscribeWalkthrough
The ttl (time to live) element describes the maximum number of minutes that a feed reader should cache the feed contents before refreshing:
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
<webMaster>jane.doe@bbc.co.uk</webMaster>
<pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
<lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
<category domain="http://www.superopendirectory.com/">news/science/technology</category>
<generator>RSSKY Feed Generator</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
<ttl>15</ttl>
...rss content continues here
</channel>
</rss>
The image element specifies the URL of a GIF, JPEG or PNG image that can be displayed with the channel.
It contains three required elements:
<url> - the url of the image
<title> - the title of the image (is used as the value of the alt tag when the image is displayed in HTML format)
<link> - the URL of the channel (when displayed in HTML format, clicking on the image will navigate to this location)
The image element also contains three optional elements:
<width> - the width of the image
<height> - the height of the image
<description> - a description the image/channel (when displayed in HTML format, is the title of the image link)
For example:
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
<webMaster>jane.doe@bbc.co.uk</webMaster>
<pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
<lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
<category domain="http://www.superopendirectory.com/">news/science/technology</category>
<generator>RSSKY Feed Generator</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
<ttl>15</ttl>
<image>
<url>http://news.bbc.co.uk/go/rss/-/1/hi/technology/images/feedimage.jpg</url>
<title>BBC Technology News</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<width>200</width>
<height>200</height>
<description>Today's BBC technology news</description>
</image>
...rss content continues here
</channel>
</rss>
The rating element describes the PICS rating for the channel. PICS is a W3C specification for metadata with internet content. associating. It is rarely used in RSS feeds, but here is an example:
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
<webMaster>jane.doe@bbc.co.uk</webMaster>
<pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
<lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
<category domain="http://www.superopendirectory.com/">news/science/technology</category>
<generator>RSSKY Feed Generator</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
<ttl>15</ttl>
<image>
<url>http://news.bbc.co.uk/go/rss/-/1/hi/technology/images/feedimage.jpg</url>
<title>BBC Technology News</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<width>200</width>
<height>200</height>
<description>Today's BBC technology news</description>
</image>
<rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>
...rss content continues here
</channel>
</rss>
The textInput element describes a text input box that can be associated with the channel. It's purpose is somewhat obscure, but you can use it to define items like a search or user feedback field.
The textInput element contains four required element:
<title> - The label of the submit button for the text input field
<description> - A description of the text input field
<name> - The name of the text object and the text input area
<link> - The URL of the script that is called by clicking on the submit button
Below is an example of defining a search field for the RSS channel:
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
<webMaster>jane.doe@bbc.co.uk</webMaster>
<pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
<lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
<category domain="http://www.superopendirectory.com/">news/science/technology</category>
<generator>RSSKY Feed Generator</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
<ttl>15</ttl>
<image>
<url>http://news.bbc.co.uk/go/rss/-/1/hi/technology/images/feedimage.jpg</url>
<title>BBC Technology News</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<width>200</width>
<height>200</height>
<description>Today's BBC technology news</description>
</image>
<rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>
<textinput>
<title>Search</title>
<description>Search the BBC technology site</description>
<name>"searchform"</name>
<link>"http://www.google.com/search"</link>
</textinput>
...rss content continues here
</channel>
</rss>
Note: Most feed readers ignore the textInput data.
The skipHours element contains up to 24 <hour> subelements, each marking an hour that queries of the feed can be skipped. The allowable range is between 0 and 23.
The following example instructs readers not to query the feed from between 1:00 and 3:00 AM.
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
<webMaster>jane.doe@bbc.co.uk</webMaster>
<pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
<lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
<category domain="http://www.superopendirectory.com/">news/science/technology</category>
<generator>RSSKY Feed Generator</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
<ttl>15</ttl>
<image>
<url>http://news.bbc.co.uk/go/rss/-/1/hi/technology/images/feedimage.jpg</url>
<title>BBC Technology News</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<width>200</width>
<height>200</height>
<description>Today's BBC technology news</description>
</image>
<rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>
<textinput>
<title>Search</title>
<description>Search the BBC technology site</description>
<name>"searchform"</name>
<link>"http://www.google.com/search"</link>
</textinput>
<skipHours>
<hour>1</hour>
<hour>2</hour>
</skipHours>
...rss content continues here
</channel>
</rss>
The skipDays element contains up to 7 <day> subelements, each marking a day of the week that queries of the feed can be skipped. Allowable values are: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday or Sunday.
The following example instructs readers not to query the feed on Sunday.
<rss version='2.0'>
<channel>
<title>BBC News | Technology | UK Edition</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<description>Updated every minute of every day</description>
<language>en-gb</language>
<copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
<managingEditor>john.doe@bbc.co.uk</managingEditor>
<webMaster>jane.doe@bbc.co.uk</webMaster>
<pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
<lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
<category domain="http://www.superopendirectory.com/">news/science/technology</category>
<generator>RSSKY Feed Generator</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
<ttl>15</ttl>
<image>
<url>http://news.bbc.co.uk/go/rss/-/1/hi/technology/images/feedimage.jpg</url>
<title>BBC Technology News</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
<width>200</width>
<height>200</height>
<description>Today's BBC technology news</description>
</image>
<rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>
<textinput>
<title>Search</title>
<description>Search the BBC technology site</description>
<name>"searchform"</name>
<link>"http://www.google.com/search"</link>
</textinput>
<skipHours>
<hour>1</hour>
<hour>2</hour>
</skipHours>
<skipDays>
<day>Sunday</day>
</skipDays>
...rss content continues here
</channel>
</rss>
A channel can contain any number of item elements. An item describes a single item of syndicated content, such as a news article or blog entry.
<rss version ='2.0'> <channel> <title>BBC News | Technology | UK Edition</title> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link> <description>Updated every minute of every day</description> <language>en-gb</language> <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate> <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright> <docs>http://www.bbc.co.uk/syndication/</docs> <ttl>15</ttl> <image> <title>BBC News</title> <url>http://news.bbc.co.uk/nol/shared/img/bbc_news_120x60.gif</url> <link>http://news.bbc.co.uk</link> </image> <item> ...item content here </item> <item> ...item content here </item> <item> ...item content here </item> </channel> </rss>
All subelements of item are optional; however, there must exist at least one title or description element must be present. In such a case, the item content can be self-contained -- that is, the entire article content included in the description tag (in entity encoded HTML, if necessary).
The following sections describe the various elements permitted in an item element.
The title element contains a short title for the item:
<item>
<title>Napster launches radio song sales</title>
</item>
Note: since RSS is an XML-based format, you must ensure that illegal XML characters such as ampersands (&) are either properly escaped, or the element text is contained by a CDATA Section.
The link element contains the URL where the article can be found:
<item>
<title>Napster launches radio song sales</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
</item>
The description element contains a description of the contents of the article:
<item>
<title>Napster launches radio song sales</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
<description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
</item>
The author element contains the email address of the author of the article:
<item>
<title>Napster launches radio song sales</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
<description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
<author>shaunfanning@napster.com</author>
</item>
The category element is identical in format to the category element for a channel. Please refer to section 2.5.7.
The comments element contains an URL for comments pertaining to the article:
<item>
<title>Napster launches radio song sales</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
<description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
<author>shaunfanning@napster.com</author>
<comments>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287_comments.stm</comments>
</item>
The enclosure element describes a media object -- such as a audio or video file -- associated with the article.
It has three attribute parameters:
url - the URL of the media object
length - the length in bytes of the media object
type - the mime type of the media object
To describe an mp3 file associated with the article, for instance:
<item>
<title>Napster launches radio song sales</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
<description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
<author>shaunfanning@napster.com</author>
<comments>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287_comments.stm</comments>
<enclosure url="http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/fanningspeaks.mp3" length="12216320" type="audio/mpeg" />
</item>
The guid element assigns a global unique identifier to the article -- an ID that differentiates it from all other articles. It generally comes in the form of an http URL.
It has one optional attribute, isPermaLink. If isPermaLink is set to true, it means that the guid is an actual HTTP URL that the user can visit to view the article. For instance:
<item> <title>Napster launches radio song sales</title> <link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link> <description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description> <author>shaunfanning@napster.com</author> <comments>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287_comments.stm</comments> <enclosure url="http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/fanningspeaks.mp3" length="12216320" type="audio/mpeg" /> <guid isPermalink="true">http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</isPermaLink> </item>
The pubDate element for an item is identical in format to the pubDate element for a channel. Please see section for more information.
<item>
<title>Napster launches radio song sales</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
<description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
<author>shaunfanning@napster.com</author>
<comments>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287_comments.stm</comments>
<enclosure url="http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/fanningspeaks.mp3" length="12216320" type="audio/mpeg" />
<guid isPermalink="true">http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</isPermaLink>
<pubDate>Wed, 27 Jul 05 08:10:56 GMT</pubDate>
</item>
The source element for an item indictes that the article was derived from another news feed., and the URL of that feed.
It's value is the title of the source feed. It also has a single required attribute, url, which specifies the URL of the source feed.
<item>
<title>Napster launches radio song sales</title>
<link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
<description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
<author>shaunfanning@napster.com</author>
<comments>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287_comments.stm</comments>
<enclosure url="http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/fanningspeaks.mp3" length="12216320" type="audio/mpeg" />
<guid isPermalink="true">http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</isPermaLink>
<pubDate>Wed, 27 Jul 05 08:10:56 GMT</pubDate>
<source url='http://www.wired.com/rss.xml'>Wired Online</source>
</item>
RSS 2.0 adds the capability to extend the RSS specification. A RSS feed may contain non-standard elements if they are defined in a namespace.
The following example adds metadata items from the Dublin Core specification:
<?xml version="1.0" encoding="ISO-8859-1" ?> <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"> <channel> <title>BBC News | Technology | UK Edition</title> <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link> <description>Updated every minute of every day</description> <dc:language>en-us</dc:language> <dc:creator/> <dc:rights>Copyright 2004</dc:rights> ...
There are five main flavors of the RSS specification -- 0.9, 0.91, 0.92, 1.0, and 2.0 -- and understanding the subtleties of the variations between each version can be daunting. To confuse matters even more, improperly formed RSS documents are a common occurrence.
The aim of DOMIT! RSS is to consolidate these variances under a single API, and thus allow you to programatically pull data from any RSS feed, no matter what the version, with complete consistency.
Another advantage of using DOMIT! RSS is that it piggybacks on top of the DOMIT! XML parser. You therefore have available, in addition to the DOMIT! RSS API, the standard methods and properties of the Document Object Model.
DOMIT! RSS also includes such features as a caching system that stores feeds locally, only refreshing them from the source URL at specified intervals.
DOMIT! RSS comes in two versions:
The main DOMIT! RSS library, which exposes the full power of the API but is weightier than you may require in some circumstances.
The DOMIT! RSS Lite library, which exposes only a subset of the API (that pertaining to title, link, and description elements) and is consequently lighterweight and faster.
DOMIT! RSS is written in pure PHP, so it should work identically from PHP 4.0 and up without the need to install for PHP extensions.
Since DOMIT! RSS is not an extension, it requires no special setup on your web server. You will, however, need to have the following files present on your server filesystem:
xml_domit_rss_shared.php
- shared code for DOMIT! RSS and DOMIT! RSS Lite
xml_domit_rss.php
- the main DOMIT! RSS code
xml_domit_rss_lite.php
- main DOMIT! RSS Lite code
php_text_cache.php
- required if you want to render your XML as a normalized (whitespace formatted) string or if you want to use the parseXML method of DOMIT_Document.
php_file_utilities.php
- generic file input / output utilities
php_http_client_generic.php
- generic http client class
php_http_client_include.php
- include file for http client class
php_http_connector.php
- helper class for php_http_client
php_http_exceptions.php
- http exceptions class
php_http_proxy.php
- http proxy class
php_http_status_codes.php
- HTTP status codes for the http proxy class
You will also need to download the latest version of the DOMIT! XML parser and install it in the same directory as DOMIT! RSS. You must use a version of DOMIT! no earlier than 1.0.
In DOMIT! RSS, an RSS Document is represented by the xml_domit_rss_document
(or xml_domit_rss_document_lite
) class.
You create an instance of the xml_domit_rss_document
class in the same way as any other PHP class, using the new keyword.
The easiest was to both instantiate an RSS document and simultaneously parse it, is to pass in the URL or filename of the feed as the first parameter:
//instantiate RSS document, and parse feed at http://www.somesite.com/rss.xml $rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');
A DOMIT! RSS Lite document is instantiated similarly:
//instantiate RSS Lite document, and parse feed at http://www.somesite.com/rss.xml $rssdoc =& new xml_domit_rss_lite_document('http://www.somesite.com/rss.xml');
Processing an RSS feed from an URL can also be broken into two steps:
a DOMIT! RSS Document is first created
the loadRSS
method is called
For example:
//instantiate RSS document $rssdoc =& new xml_domit_rss_document(); //parse feed at http://www.somesite.com/rss.xml $success = $rssdoc->loadRSS('http://www.somesite.com/rss.xml');
If the document is successfully parsed, loadRSS
returns true.
The parseRSS
method works in exactly the same way as loadRSS
; however, it takes an RSS string as a parameter, rather than an URL.
For example:
//instantiate RSS document $rssdoc =& new xml_domit_rss_document(); //string containing the text of an RSS feed $myRSS = "<rss version='0.95'>\n\t <channel>\n\t\t <title>My Feed</title>\n\t\t <link>http://www.myfeed.com/rss.xml</link>\n\t\t <description>This is my silly RSS feed</description>\n\t\t <item>\n\t\t\t <title>Thoughts for July 29, 2005</title>\n\t\t\t <link>http://www.myfeed.com/20050729.html</link>\n\t\t\t <description>Musings about the link between RSS, existentialism, and egg salad sandwiches.</description>\n\t\t </item>\n\t </channel>\n </rss>"; //parse RSS string $success = $rssdoc->parseRSS($myRSS);
DOMIT! RSS, by default, will cache a copy of your feed on the local filesystem and use this cached copy to draw its data from, rather than continually access a remote URL.
There are two parameters available for configuring the cache:
the cache location, which by default is set to './' (the same directory as DOMIT! RSS)
the cache duration, which by default is set to 3600 seconds (one hour)
If you would like to use cache values other than the defaults provided, you can pass in a new cache location and duration when instantiating a DOMIT! RSS document:
//instantiate RSS document //also set cache directory to ../cachefiles/ and cache duration to 2 hours $rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml', '../cachefiles/', 7200);
The same can be done using the loadRSS
method:
//instantiate RSS document $rssdoc =& new xml_domit_rss_document(); //parse feed at http://www.somesite.com/rss.xml //also set cache directory to ../cachefiles/ and cache duration to 2 hours $success = $rssdoc->loadRSS('http://www.somesite.com/rss.xml', '../cachefiles/', 7200);
Sometimes the default approach to populating a DOMIT! RSS Document is insufficient. At times more flexibility is required.
By default, DOMIT! RSS uses the PHP function get_file_contents
or standard PHP file input streams to retrieve the contents of an RSS feed. However, under certain consitions, both of these approaches can fail when passed a remote URL.
A number of additional options exist to deal with these possibilities.
As of version 0.5, DOMIT! RSS comes bundled with the php_http_client
library, written by Engage Interactive. The useHTTPClient
method allows you to force DOMIT! RSS into establishing a standard HTTP connection to the web server hosting the XML file:
//instantiate DOMIT! RSS document $rssdoc =& new xml_domit_rss_document(); //specify that an HTTP client should be used to retrieve XML $rssdoc->useHTTPClient(true); //call loadRSS method as usual $success = $rssdoc->loadRSS("http://www.engageinteractive.com/rssfeed.xml");
The HTTP connection will be attempted on port 80.
Sometimes when you attempt to obtain RSS data from a remote site, the server is slow or unavailable. Either you are unable to establish a connection, or the connection is so slow that your own site appears to be hanging.
The setRSSTimeout
method allows you to set a timeout value for obtaining RSS data, beyond which the value returned by loadRSS will be false.
The following example times out after 10 seconds of unsuccessfully being able to retrieve data from the remote url:
//instantiate DOMIT! RSS document $rssdoc =& new xml_domit_rss_document(); //set a timeout value of 10 seconds $rssdoc->setRSSTimeout(10); //call loadRSS method $success = $rssdoc->loadRSS("http://www.engageinteractive.com/rssfeed.xml"); if ($success) { //process RSS } else { //no RSS to process; possibly a timeout }
If you need to establish an HTTP connection to retrieve your RSS data, but the useHTTPClient
method does not provide enough flexibility, the setConnection
method of a DOMIT! RSS document can be used to manually set the parameters of the connection.
$rssdoc =& new xml_domit_rss_document(); //establish HTTP connection on port 955 $rssdoc->setConnection('http://www.engageinteractive.com', '/', '955'); //call loadRSS method as usual $success = $rssdoc->loadRSS("http://www.engageinteractive.com/rssfeed.xml");
In the above example, an HTTP connection will be established on port 955 of host http://www.engageinteractive.com. You can also use a raw IP address for the host, such as http://198.162.0.10
Note that you can also pass in a user name and password to the setConnection
method, if you must use HTTP Authorization to establish your connection. For more about HTTP Authorization, please see the entry on the setAuthorization
method.
The HTTP specification allows for a basic (i.e., not particularly secure) type of authorization called HTTP Authorization. If the RSS file that you require is protected by this sort of authentication, you can use the setAuthorization
method of DOMIT! RSS.
setAuthorization
is used in conjunction with the setConnection
method, and requires that you provide a plain text username and password:
$rssdoc =& new xml_domit_rss_document(); //establish HTTP connection on port 955 $rssdoc->setConnection('http://www.engageinteractive.com', '/', '955'); //set user name and password for authorization $rssdoc->setAuthorization('johnheinstein', 'mypassword'); //call loadRSS method as usual $success = $rssdoc->loadRSS("http://www.engageinteractive.com/rssfeed.xml");
An HTTP proxy is a server that acts as an intermediary between an HTTP client (a user's browser) and the Internet. It is used to enforce security, administrative control, and caching services.
If you are behind a firewall, for instance, and must connect to a proxy server to access web based resources, then the setProxyConnection
method will allow you to access such data.
The setProxyConnection
method works inn exactly the same way as setConnection
:
$rssdoc =& new xml_domit_rss_document(); //establish proxy connection at http://www.myproxyconnection.com on port 1060 $rssdoc->setProxyConnection('http://www.myproxyconnection.com', '/', '1060'); //call loadRSS method as usual $success = $rssdoc->loadRSS("http://www.engageinteractive.com/rssfeed.xml");
The setProxyAuthorization
is called in exactly the same way as setAuthorization
. Just provide a valid user name and password:
$rssdoc =& new xml_domit_rss_document(); //establish proxy connection at http://www.myproxyconnection.com on port 1060 $rssdoc->setProxyConnection('http://www.myproxyconnection.com', '/', '1060'); //set user name and password for authorization $rssdoc->setProxyAuthorization('johnheinstein', 'mypassword'); //call loadRSS method as usual $success = $rssdoc->loadRSS("http://www.engageinteractive.com/rssfeed.xml");
When an exception occurs in DOMIT! RSS -- perhaps as a result of a remote server being down or malformed XML -- you have a number of options available for displaying these errors.
The xml_domit_rss_exception::setErrorMode
method allows you to define the behavior of DOMIT! RSS when an exception occurs. It takes a single parameter -- an integer or interger constant representing the error mode:
DOMIT_RSS_ONERROR_CONTINUE (1) - specifies that DOMIT! RSS should continue processing after an exception occurs. This is the default behavior.
DOMIT_RSS_ONERROR_DIE (2) - specifies that DOMIT! RSS should die and display the error message after an exception occurs.
For example:
$rssdoc =& new xml_domit_rss_document(); //sets DOMIT! RSS to die on an exception xml_domit_rss_exception::setErrorMode(DOMIT_RSS_ONERROR_DIE);
The xml_domit_rss_exception::setErrorLog
method allows you to specify a file to which error messages are logged and timestamped. This is a useful feature for debugging RSS feed problems.
It takes two parameters:
a boolean specifying whether logging should be turned on (true) or off (false)
a string containing the absolute or relative path of the error log file.
The following example specifies that errors are to be logged to the file 'rssErrorLog.txt':
$rssdoc =& new xml_domit_rss_document(); //specifies that error logging is to be enabled and the error log filename xml_domit_rss_exception::setErrorLog(true, 'rssErrorLog.txt');
If you would like to set a custom error handler for DOMIT! RSS, you can use the xml_domit_rss_exception::setErrorHandler
method.
It takes a single parameter -- the method to handle the error.
The custom errorhandler method must have the following method signature...
function myCustomErrorHandler($errorNum, $errorString)
...where $errorNum
is an integer signifying the number of the error, and $errorString
is a string giving a description of the error.
For example, if you wrote a function to handle your DOMIT! RSS errors that looked like this:
function myErrorHandler($errorNum, $errorString) { echo "The error number is " . $errorNum . " and " the error string is " . $errorString; }
You could invoke it like this:
xml_domit_rss_exception::setErrorHandler("myErrorHandler");
If the myErrorHandler
function was a method of a class named ErrorHandlers
rather than a standalone function, you could invoke setErrorHandler like this:
xml_domit_rss_exception::setErrorHandler(array("ErrorHandlers", "myErrorHandler"));
Once you have successfully loaded a DOMIT! RSS document, you are ready to begin extracting feed data.
We will use this RSS document for the following examples:
<?xml version="1.0"?> <rss version='0.95'> <channel> <title>My Feed</title> <link>http://www.myfeed.com/rss.xml</link> <description>This is my silly RSS feed</description> <language>en-ca</language> <copyright>2005 John Heinstein</copyright> <managingEditor>johnkarl@nbnet.nb.ca</managingEditor> <webMaster>johnkarl@nbnet.nb.ca</webMaster> <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate> <lastBuildDate>Sat, 30 Jul 05 13:20:35 GMT</lastBuildDate> <generator>RSSky Feed Generator</generator> <docs>http://www.myfeed.com/rss/docs.html</docs> <cloud domain="www.myfeed.com" port="80" path="/rss" registerProcedure="rssSystem.rssPleaseNotify" protocol="xml-rpc" /> <ttl>20</ttl> <rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating> <image> <title>My Feed</title> <url>http://www.myfeed.com/rss/myfeed.jpg</url> <link>http://www.myfeed.com/</link> <width>100</width> <height>70</height> <description>Picture of John</description> </image> <textinput> <title>Search</title> <description>Search the My Feed site</description> <name>searchform</name> <link>http://www.google.com/search</link> </textinput> <skipDays> <day>Friday</day> <day>Saturday</day> <day>Sunday</day> </skipDays> <skipHours> <hour>16</hour> </skipHours> <category domain="http://www.superopendirectory.com/">philosophy/humor</category> <category domain="http://www.superopendirectory.com/">philosophy/hogwash</category> <item> <title>Thoughts for July 29, 2005</title> <link>http://www.myfeed.com/20050729.html</link> <description>Musings about the link between RSS, existentialism, and egg salad sandwiches.</description> <author>johnkarl@nbnet.nb.ca</author> <comments>http://www.myfeed.com/comments/20050729.html</comments> <enclosure url="http://www.myfeed.com/audio/20050729.mp3" length="12216320" type="audio/mpeg" /> <guid isPermaLink="true">http://www.myfeed.com/20050729.html</guid> <pubDate>Fri, 29 Jul 05 14:15:16 GMT</pubDate> <source url="http://mindsaye.ca/rss/20050729.html">The Minds Aye</source> </item> <item> <title>Thoughts for July 30, 2005</title> <link>http://www.myfeed.com/20050730.html</link> <description>What if the earth were round not flat?</description> <author>johnkarl@nbnet.nb.ca</author> <comments>http://www.myfeed.com/comments/20050730.html</comments> <enclosure url="http://www.myfeed.com/audio/20050730.mp3" length="12216320" type="audio/mpeg" /> <guid isPermaLink="true">http://www.myfeed.com/20050730.html</guid> <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate> <source url="http://mindsaye.ca/rss/20050730.html">The Minds Aye</source> </item> </channel> </rss>
There are several methods available to you for obtaining document level information: parsedBy
, getVersion
, and getRSSVersion
.
To determine whether DOMIT! RSS or DOMIT! RSS Lite was used to parse your document, you can use the parsedBy
method:
$rssParser = $rssdoc->parsedBy();
The parsedBy
method returns a string with a value of either DOMIT_RSS or DOMIT_RSS_LITE.
The getVersion
method returns the version number of the current install of DOMIT! RSS.
$myVersion = $rssdoc->getVersion();
A text representation of an RSS document or any of its elements can be displayed using the
and toString
methods.toNormalizedString
We can display an unformatted string representation of an RSS document using the toString
method :
//instantiate RSS document, and parse feed at http://www.somesite.com/rss.xml require_once('xml_domit_rss.php'); $rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml'); //echo document to browser echo $rssdoc->toString(true);
The following string will be echoed to the browser window:
<<rss version="0.95"><channel><title>My Feed</title><link>http://www.myfeed.com/rss.xml</link><description>This is my silly RSS feed</description><language>en-ca</language><copyright>2005 John Heinstein</copyright><managingEditor>johnkarl@nbnet.nb.ca</managingEditor><webMaster>johnkarl@nbnet.nb.ca</webMaster><pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate><lastBuildDate>Sat, 30 Jul 05 13:20:35 GMT</lastBuildDate><generator>RSSky Feed Generator</generator><docs>http://www.myfeed.com/rss/docs.html</docs><cloud domain="www.myfeed.com" port="80" path="/rss" registerProcedure="rssSystem.rssPleaseNotify" protocol="xml-rpc" /><ttl>20</ttl><rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating><image><title>My Feed</title><url>http://www.myfeed.com/rss/myfeed.jpg</url><link>http://www.myfeed.com/</link><width>100</width><height>70</height><description>Picture of John</description></image><textinput><title>Search</title><description>Search the My Feed site</description><name>searchform</name><link>http://www.google.com/search</link></textinput><skipDays><day>Friday</day><day>Saturday</day><day>Sunday</day></skipDays><skipHours><hour>16</hour></skipHours><category domain="http://www.superopendirectory.com/">philosophy/humor</category><category domain="http://www.superopendirectory.com/">philosophy/hogwash</category><item><title>Thoughts for July 29, 2005</title><link>http://www.myfeed.com/20050729.html</link><description>Musings about the link between RSS, existentialism, and egg salad sandwiches.</description><author>johnkarl@nbnet.nb.ca</author><comments>http://www.myfeed.com/comments/20050729.html</comments><enclosure url="http://www.myfeed.com/audio/20050729.mp3" length="12216320" type="audio/mpeg" /><guid isPermaLink="true">http://www.myfeed.com/20050729.html</guid><pubDate>Fri, 29 Jul 05 14:15:16 GMT</pubDate><source url="http://mindsaye.ca/rss/20050729.html">The Minds Aye</source></item><item><title>Thoughts for July 30, 2005</title><link>http://www.myfeed.com/20050730.html</link><description>What if the earth were round not flat?</description><author>johnkarl@nbnet.nb.ca</author><comments>http://www.myfeed.com/comments/20050730.html</comments><enclosure url="http://www.myfeed.com/audio/20050730.mp3" length="12216320" type="audio/mpeg" /><guid isPermaLink="true">http://www.myfeed.com/20050730.html</guid><pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate><source url="http://mindsaye.ca/rss/20050730.html">The Minds Aye</source></item></channel></rss>
The first parameter of toString
, if set to true, converts special HTML characters into their encoded version (i.e. & into &) so that they will display properly in a browser.
If you would like unconverted raw text to be output (for instance, when echoing to a command line interface) substitute a value of false:
echo $rssdoc->toString(false);
One drawback of the toString
output is that it is not particularly readable, since all text of the node is compressed into one line. The toNormalizedString
method will output text that is much more nicely formatted:
//instantiate RSS document, and parse feed at http://www.somesite.com/rss.xml require_once('xml_domit_rss.php'); $rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml'); //echo document to browser echo $rssdoc->toNormalizedString(true);
The following string will be echoed to the browser window:
<rss version="0.95"> <channel> <title>My Feed</title> <link>http://www.myfeed.com/rss.xml</link> <description>This is my silly RSS feed</description> <language>en-ca</language> <copyright>2005 John Heinstein</copyright> <managingEditor>johnkarl@nbnet.nb.ca</managingEditor> <webMaster>johnkarl@nbnet.nb.ca</webMaster> <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate> <lastBuildDate>Sat, 30 Jul 05 13:20:35 GMT</lastBuildDate> <generator>RSSky Feed Generator</generator> <docs>http://www.myfeed.com/rss/docs.html</docs> <cloud domain="www.myfeed.com" port="80" path="/rss" registerProcedure="rssSystem.rssPleaseNotify" protocol="xml-rpc" /> <ttl>20</ttl> <rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating> <image> <title>My Feed</title> <url>http://www.myfeed.com/rss/myfeed.jpg</url> <link>http://www.myfeed.com/</link> <width>100</width> <height>70</height> <description>Picture of John</description> </image> <textinput> <title>Search</title> <description>Search the My Feed site</description> <name>searchform</name> <link>http://www.google.com/search</link> </textinput> <skipDays> <day>Friday</day> <day>Saturday</day> <day>Sunday</day> </skipDays> <skipHours> <hour>16</hour> </skipHours> <category domain="http://www.superopendirectory.com/">philosophy/humor</category> <category domain="http://www.superopendirectory.com/">philosophy/hogwash</category> <item> <title>Thoughts for July 29, 2005</title> <link>http://www.myfeed.com/20050729.html</link> <description>Musings about the link between RSS, existentialism, and egg salad sandwiches.</description> <author>johnkarl@nbnet.nb.ca</author> <comments>http://www.myfeed.com/comments/20050729.html</comments> <enclosure url="http://www.myfeed.com/audio/20050729.mp3" length="12216320" type="audio/mpeg" /> <guid isPermaLink="true">http://www.myfeed.com/20050729.html</guid> <pubDate>Fri, 29 Jul 05 14:15:16 GMT</pubDate> <source url="http://mindsaye.ca/rss/20050729.html">The Minds Aye</source> </item> <item> <title>Thoughts for July 30, 2005</title> <link>http://www.myfeed.com/20050730.html</link> <description>What if the earth were round not flat?</description> <author>johnkarl@nbnet.nb.ca</author> <comments>http://www.myfeed.com/comments/20050730.html</comments> <enclosure url="http://www.myfeed.com/audio/20050730.mp3" length="12216320" type="audio/mpeg" /> <guid isPermaLink="true">http://www.myfeed.com/20050730.html</guid> <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate> <source url="http://mindsaye.ca/rss/20050730.html">The Minds Aye</source> </item> </channel> </rss>
As with the toString
method, passing a value of false into toNormalizedString
outputs text that is not formatted for HTML display.
Once you have instantiated and populated a DOMIT! RSS Document from an RSS feed, you are able to traverse the hierarchy of the document and access the element data. The first element that you must access is the channel element.
Although officially, only a single channel is allowed in an RSS document, in common practice you will occasionally encounter more than one channel.
The getChannelCount
method determines how many channels exist in an RSS document, allowing you to programmatically loop through each channel and extract information:
//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');
//get number of channels
$numChannels = $rssdoc->getChannelCount();
//echo channel count to browser
echo "Number of channels is: " . $numChannels;
//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
//process current channel...
}
The result:
Number of channels is: 1
Once you have determined the number of channels that exist in an RSS document, you can obtain a reference to a particular channel using the getChannel
method:
getChannel
takes a single parameter -- an integer specifying the index of the requested channel.
//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');
//get number of channels
$numChannels = $rssdoc->getChannelCount();
//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
//obtain a reference to the current channel
$currChannel =& $rssdoc->getChannel($i);
//echo current channel to browser
echo $currChannel->toNormalizedString(true);
}
The result is:
<channel> <title>My Feed</title> <link>http://www.myfeed.com/rss.xml</link> <description>This is my silly RSS feed</description> <language>en-ca</language> <copyright>2005 John Heinstein</copyright> <managingEditor>johnkarl@nbnet.nb.ca</managingEditor> <webMaster>johnkarl@nbnet.nb.ca</webMaster> <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate> <lastBuildDate>Sat, 30 Jul 05 13:20:35 GMT</lastBuildDate> <generator>RSSky Feed Generator</generator> <docs>http://www.myfeed.com/rss/docs.html</docs> <cloud domain="www.myfeed.com" port="80" path="/rss" registerProcedure="rssSystem.rssPleaseNotify" protocol="xml-rpc" /> <ttl>20</ttl> <rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating> <image> <title>My Feed</title> <url>http://www.myfeed.com/rss/myfeed.jpg</url> <link>http://www.myfeed.com/</link> <width>100</width> <height>70</height> <description>Picture of John</description> </image> <textinput> <title>Search</title> <description>Search the My Feed site</description> <name>searchform</name> <link>"http://www.google.com/search"</link> </textinput> <skipDays> <day>Friday</day> <day>Saturday</day> <day>Sunday</day> </skipDays> <skipHours> <hour>16</hour> </skipHours> <category domain="http://www.superopendirectory.com/">philosophy/humor</category> <category domain="http://www.superopendirectory.com/">philosophy/hogwash</category> <item> <title>Thoughts for July 29, 2005</title> <link>http://www.myfeed.com/20050729.html</link> <description>Musings about the link between RSS, existentialism, and egg salad sandwiches.</description> <author>johnkarl@nbnet.nb.ca</author> <comments>http://www.myfeed.com/comments/20050729.html</comments> <enclosure url="http://www.myfeed.com/audio/20050729.mp3" length="12216320" type="audio/mpeg" /> <guid isPermaLink="true">http://www.myfeed.com/20050729.html</guid> <pubDate>Fri, 29 Jul 05 14:15:16 GMT</pubDate> <source url="http://mindsaye.ca/rss/20050729.html">The Minds Aye</source> </item> <item> <title>Thoughts for July 30, 2005</title> <link>http://www.myfeed.com/20050730.html</link> <description>What if the earth were round not flat?</description> <author>johnkarl@nbnet.nb.ca</author> <comments>http://www.myfeed.com/comments/20050730.html</comments> <enclosure url="http://www.myfeed.com/audio/20050730.mp3" length="12216320" type="audio/mpeg" /> <guid isPermaLink="true">http://www.myfeed.com/20050730.html</guid> <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate> <source url="http://mindsaye.ca/rss/20050730.html">The Minds Aye</source> </item> </channel>
A channel is required, at minimum, to contain title, link, and description elements. The getTitle
, getLink
, and getDescription
methods can be used to access the data in these elements.
The getTitle
method will return the title of a channel:
//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');
//get number of channels
$numChannels = $rssdoc->getChannelCount();
//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
//obtain a reference to the current channel
$currChannel =& $rssdoc->getChannel($i);
//echo title of channel
echo $currChannel->getTitle();
}
The result is:
My Feed
The getLink
method will return the link of a channel:
//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');
//get number of channels
$numChannels = $rssdoc->getChannelCount();
//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
//obtain a reference to the current channel
$currChannel =& $rssdoc->getChannel($i);
//echo link of channel
echo $currChannel->getLink();
}
The result is:
http://www.myfeed.com/rss.xml
The getDescription
method will return a description of a channel:
//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');
//get number of channels
$numChannels = $rssdoc->getChannelCount();
//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
//obtain a reference to the current channel
$currChannel =& $rssdoc->getChannel($i);
//echo description of channel
echo $currChannel->getDescription();
}
The result is:
This is my silly RSS feed
The RSS specification documents a number of additional elements such as 'language' and 'copyright' that can belong to a channel. The following sections detail how the data in these elements can be accessed.
You are often be certain whether a nonrequired element is present in any particular RSS feed.
The hasElement
method allows you to test for the existence of a named element. hasElement
takes a single parameter -- the name of the element whose existence you are testing for.
If , for instance, you want to determine if the element copyright belonged to a channel, you could do this:
$doesCopyrightExist = $currentChannel->hasElement('copyright');
If the copyright element is found, true is returned.
The getLanguage
method returns the language of a channel.
//check if language element exists
if ($currChannel->hasElement('language')) {
//echo language to browser
echo $currChannel->getLanguage();
}
The result is:
en-ca
The getCopyright
method returns the copyright statement of a channel.
//check if copyright element exists
if ($currChannel->hasElement('copyright')) {
//echo copyright to browser
echo $currChannel->getCopyright();
}
The result is:
2005 John Heinstein
The getManagingEditor
method returns the email address of the managing editor of a channel.
//check if managing editor element exists
if ($currChannel->hasElement('managingEditor')) {
//echo managing editor to browser
echo $currChannel->getManagingEditor();
}
The result is:
johnkarl@nbnet.nb.ca
The getLanguage
method returns the email address of the webmaster of a channel.
//check if webmaster element exists
if ($currChannel->hasElement('webMaster')) {
//echo webmaster to browser
echo $currChannel->getWebMaster();
}
The result is:
johnkarl@nbnet.nb.ca
The getPubDate
method returns the language of a channel.
//check if pubDate element exists
if ($currChannel->hasElement('pubDate')) {
//echo pubDate to browser
echo $currChannel->getPubDate();
}
The result is:
Sat, 30 Jul 05 13:20:35 GMT
The getLastBuildDate
method returns the last build date of a channel.
//check if lastBuildDate element exists
if ($currChannel->hasElement('lastBuildDate')) {
//echo lastBuildDate to browser
echo $currChannel->getLastBuildDate();
}
The result is:
Sat, 30 Jul 05 13:20:35 GMT
The getGenerator
method returns the name of the program which generated the RSS of a channel.
//check if generator element exists
if ($currChannel->hasElement('generator')) {
//echo generator to browser
echo $currChannel->getGenerator();
}
The result is:
RSSky Feed Generator
The getDocs
method returns the URL at which to find the docs for the channel.
//check if docs element exists
if ($currChannel->hasElement('docs')) {
//echo docs to browser
echo $currChannel->getDocs();
}
The result is:
http://www.myfeed.com/rss/docs.html
The getCloud
method returns the a reference to a web service for the channel which notifies the user when changes to the channel have been made.
//check if cloud element exists
if ($currChannel->hasElement('cloud')) {
//get a reference to the cloud
$myCloud =& $currChannel->getCloud();
}
Once a reference to the cloud object has been acquired, you can use the methods of the cloud -- getDomain
, getPort
, getPath
, getRegisterProcedure
, and getProtocol
-- to extract its data:
The getDomain
method of a cloud allows you to retrieve its domain:
//check if cloud element exists
if ($currChannel->hasElement('cloud')) {
//get a reference to the cloud
$myCloud =& $currChannel->getCloud();
//echo domain of the cloud
echo $myCloud->getDomain();
}
The result is:
www.myfeed.com
The getPort
method of a cloud allows you to retrieve its port:
//check if cloud element exists
if ($currChannel->hasElement('cloud')) {
//get a reference to the cloud
$myCloud =& $currChannel->getCloud();
//echo port of the cloud
echo $myCloud->getPort();
}
The result is:
80
The getPath
method of a cloud allows you to retrieve its path:
//check if cloud element exists
if ($currChannel->hasElement('cloud')) {
//get a reference to the cloud
$myCloud =& $currChannel->getCloud();
//echo path of the cloud
echo $myCloud->getPath();
}
The result is:
/rss
The getRegisterProcedure
method of a cloud allows you to retrieve its procedure:
//check if cloud element exists
if ($currChannel->hasElement('cloud')) {
//get a reference to the cloud
$myCloud =& $currChannel->getCloud();
//echo register procedure of the cloud
echo $myCloud->getRegisterProcedure();
}
The result is:
rssSystem.rssPleaseNotify
The getProtocol
method of a cloud allows you to retrieve its protocol:
//check if cloud element exists
if ($currChannel->hasElement('cloud')) {
//get a reference to the cloud
$myCloud =& $currChannel->getCloud();
//echo protocol of the cloud
echo $myCloud->getProtocol();
}
The result is:
xml-rpc
The getTTL
method returns the time to live of a channel.
//check if ttl element exists if ($currChannel->hasElement('ttl')) { //echo ttl to browser echo $currChannel->getTTL(); }
The result is:
20
The getImage
method returns the a reference to the image for the channel:
//check if image element exists
if ($currChannel->hasElement('image')) {
//get a reference to the image
$myImage =& $currChannel->getImage();
}
Once a reference to the image object has been acquired, you can use the methods of the image -- getTitle
, getLink
, getUrl
, getWidth
, getHeight
, and getDescription
-- to extract its data:
The getTitle
method of an image allows you to retrieve its title:
//check if image element exists
if ($currChannel->hasElement('image')) {
//get a reference to the image
$myImage =& $currChannel->getImage();
//echo title of the image
echo $myImage->getTitle();
}
The result is:
My Feed
The getLink
method of an image allows you to retrieve the link representing the channel:
//check if image element exists
if ($currChannel->hasElement('image')) {
//get a reference to the image
$myImage =& $currChannel->getImage();
//echo link of the image
echo $myImage->getLink();
}
The result is:
http://www.myfeed.com/
The getUrl
method of an image allows you to retrieve the URL of the image:
//check if image element exists
if ($currChannel->hasElement('image')) {
//get a reference to the image
$myImage =& $currChannel->getImage();
//echo URL of the image
$echo $myImage->getUrl();
}
The result is:
http://www.myfeed.com/rss/myfeed.jpg
The getWidth
method of an image allows you to retrieve the width of the image:
//check if image element exists
if ($currChannel->hasElement('image')) {
//get a reference to the image
$myImage =& $currChannel->getImage();
//echo width of the image
echo $myImage->getWidth();
}
The result is:
100
Note: The maximum width of an image is 144px; the default width is 88.
The getHeight
method of an image allows you to retrieve the height of the image:
//check if image element exists
if ($currChannel->hasElement('image')) {
//get a reference to the image
$myImage =& $currChannel->getImage();
//echo height of the image
echo $myImage->getHeight();
}
The result is:
70
Note: The maximum height of an image is 400px; the default height is 31.
The getDescription
method of an image allows you to retrieve a description of the image:
//check if image element exists
if ($currChannel->hasElement('image')) {
//get a reference to the image
$myImage =& $currChannel->getImage();
//echo description of the image
echo $myCloud->getDecription();
}
The result is:
Picture of John
The getRating
method returns the PICS rating of a channel.
//check if rating element exists if ($currChannel->hasElement('rating')) { //echo rating to browser echo $currChannel->getRating() }
The result is:
(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))
The getTextInput
method returns the a reference to the text input for the channel:
//check if image element exists
if ($currChannel->hasElement('textInput')) {
//get a reference to the text input
$myImage =& $currChannel->getTextInput();
}
Once a reference to the text input object has been acquired, you can use its methods -- getTitle
, getDescription, getName
, and getLink
-- to extract its data:
The getTitle
method of a text input allows you to retrieve its title:
//check if text input element exists
if ($currChannel->hasElement('textInput')) {
//get a reference to the text input
$myTextInput =& $currChannel->getTextInput();
//get title of the text input
$myTitle = $myTextInput->getTitle();
}
The result is:
Search
The getDescription
method of a text input allows you to retrieve its description:
//check if text input element exists
if ($currChannel->hasElement('textInput')) {
//get a reference to the text input
$myTextInput =& $currChannel->getTextInput();
//get title of the text input
$myTitle = $myTextInput->getTitle();
//get description of the text input
$myDescription = $myTextInput->getDescription();
}
The result is:
Search the My Feed site
The getName
method of a text input allows you to retrieve the name of its Submit button:
//check if text input element exists
if ($currChannel->hasElement('textInput')) {
//get a reference to the text input
$myTextInput =& $currChannel->getTextInput();
//get title of the text input
$myTitle = $myTextInput->getTitle();
//get description of the text input
$myDescription = $myTextInput->getDescription();
//get name of the text input
$myName = $myTextInput->getName();
}
The result is:
searchform
The getLink
method of a text input allows you to retrieve the URL of the script that is called when the Submit button is clicked:
//check if text input element exists
if ($currChannel->hasElement('textInput')) {
//get a reference to the text input
$myTextInput =& $currChannel->getTextInput();
//get title of the text input
$myTitle = $myTextInput->getTitle();
//get description of the text input
$myDescription = $myTextInput->getDescription();
//get name of the text input
$myName = $myTextInput->getName();
//get link of the text input
$myLink = $myTextInput->getLink();
}
The result is:
http://www.google.com/search
The getSkipDays
method returns the a reference to the skipDays object for the channel:
//check if skipDays element exists
if ($currChannel->hasElement('skipDays')) {
//get a reference to the skipDays object
$mySkipDays =& $currChannel->getSkipDays();
}
Once a reference to the skipDays object has been acquired, you can use its methods -- getSkipDayCount
, and getSkipDay
-- to extract its data.
The getSkipDayCount
method of skipDays returns the number of child day elements:
//check if skipDays element exists
if ($currChannel->hasElement('skipDays')) {
//get a reference to the skipDays object
$mySkipDays =& $currChannel->getSkipDays();
//get number of child day elements
$numDays = $mySkipDays->getSkipDayCount();
//echo number of days to browser
echo $numDays;
//set up loop to iterate through days
for ($i = 0; $i < $numDays; $i++) {
//process each day element
}
}
The result is:
3
The getSkipDay
method of skipDays returns the value of the day element at the specified index. It takes a single parameter -- an integer specifying the index of the day element whose data you wish to access:
//check if skipDays element exists
if ($currChannel->hasElement('skipDays')) {
//get a reference to the skipDays object
$mySkipDays =& $currChannel->getSkipDays();
//get number of child day elements
$numDays = $mySkipDays->getSkipDayCount();
//set up loop to iterate through days
for ($i = 0; $i < $numDays; $i++) {
//echo day item to browser
echo $mySkipDays->getSkipDay($i) . "\n<br />";
}
}
The result is:
Friday Saturday Sunday
The getSkipHours
method returns the a reference to the skipHours object for the channel:
//check if skipHours element exists
if ($currChannel->hasElement('skipHours')) {
//get a reference to the skipHours object
$mySkipHours =& $currChannel->getSkipHours();
}
Once a reference to the skipHours object has been acquired, you can use its methods -- getSkipHourCount
, and getSkipHour
-- to extract its data.
The getSkipHourCount
method of skipHours returns the number of child hour elements:
//check if skipHours element exists
if ($currChannel->hasElement('skipHours')) {
//get a reference to the skipHours object
$mySkipHours =& $currChannel->getSkipHours();
//get number of child hour elements
$numHours = $mySkipHours->getSkipHourCount();
//echo num hours to browser
echo $numHours;
//set up loop to iterate through hours
for ($i = 0; $i < $numHours; $i++) {
//process each hour element
}
}
The result is:
1
The getSkipHour
method of skipHours returns the value of the hour element at the specified index. It takes a single parameter -- an integer specifying the index of the hour element whose data you wish to access:
//check if skipHours element exists
if ($currChannel->hasElement('skipHours')) {
//get a reference to the skipHours object
$mySkipHours =& $currChannel->getSkipHours();
//get number of child hour elements
$numHours = $mySkipHours->getSkipHourCount();
//set up loop to iterate through hours
for ($i = 0; $i < $numHours; $i++) {
//echo day item to browser
echo "day: " . $mySkipHours->getSkipHour($i) . "\n<br />";
}
}
The result is:
16
A channel can have multiple category elements. The getCategoryCount
method indicates how many exist in the current channel:
//get number of categories $numCategories =& $currChannel->getCategoryCount(); //set up loop to iterate through categories for ($j=0; $j < $numCategories; $j++) { //process categories }
Once you have determined the number of categories and set up a loop to iterate through each one, you can use the getCategory
method to retrieve individual cateogry elements:
//get number of categories
$numCategories =& $currChannel->getCategoryCount();
//set up loop to iterate through categories
for ($j=0; $j < $numCategories; $j++) {
//get current category
$currCategory =& $currChannel->getCategory($j);
//echo to browser
echo $currCategory->toNormalizedString(true);
}
The result is:
<category domain="http://www.superopendirectory.com/">philosophy/humor</category> <category domain="http://www.superopendirectory.com/">philosophy/hogwash</category>
A category has two methods at its disposal: getCategory
and getDomain
.
The getCategory
method of a category returns the text of the category:
//get number of categories
$numCategories =& $currChannel->getCategoryCount();
//set up loop to iterate through categories
for ($j=0; $j < $numCategories; $j++) {
//get current category
$currCategory =& $currChannel->getCategory($j);
//echo category text to browser
echo $currCategory->getCategory() . "\n<br />";
}
The result is:
philosophy/humor philosophy/hogwash
The getDomain
method of a category returns the domain attribute of the category, or an empty string if one does not exist:
//get number of categories
$numCategories =& $currChannel->getCategoryCount();
//set up loop to iterate through categories
for ($j=0; $j < $numCategories; $j++) {
//get current category
$currCategory =& $currChannel->getCategory($j);
//echo domain to browser
echo $currCategory->getDomain() . "\n<br />";
}
The result is:
http://www.superopendirectory.com/ http://www.superopendirectory.com/
With a reference to a channel in hand, you are able to loop through the items of that channel and extract the item data. The process is almost identical to looping through the channels of an RSS document.
The getItemCount
method determines how many items exist in a channel, allowing you to programmatically loop through each item and extract information:
//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');
//get number of channels
$numChannels = $rssdoc->getChannelCount();
//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
//obtain a reference to the current channel
$currChannel =& $rssdoc->getChannel($i);
//get number of items
$numItems = $currChannel->getItemCount();
//set up a loop to iterate through each item
for ($j = 0; $j < $numItems; $j++) {
//process item data
}
}
Once you have determined the number of items that exist in a channel, you can obtain a reference to a particular item using the getItem
method:
getitem
takes a single parameter -- an integer specifying the index of the requested item.
//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');
//get number of channels
$numChannels = $rssdoc->getChannelCount();
//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
//obtain a reference to the current channel
$currChannel =& $rssdoc->getChannel($i);
//get number of items
$numItems = $currChannel->getItemCount();
//set up a loop to iterate through each item
for ($j = 0; $j < $numItems; $j++) {
//get reference to current item`
$currItem =& $currChannel->getItem($j);
//echo to browser
echo $currItem->toNormalizedString(true);
}
}
The result is:
<item> <title>Thoughts for July 29, 2005</title> <link>http://www.myfeed.com/20050729.html</link> <description>Musings about the link between RSS, existentialism, and egg salad sandwiches.</description> <author>johnkarl@nbnet.nb.ca</author> <comments>http://www.myfeed.com/comments/20050729.html</comments> <enclosure url="http://www.myfeed.com/audio/20050729.mp3" length="12216320" type="audio/mpeg" /> <guid isPermaLink="true">http://www.myfeed.com/20050729.html</guid> <pubDate>Fri, 29 Jul 05 14:15:16 GMT</pubDate> <source url="http://mindsaye.ca/rss/20050729.html">The Minds Aye</source> </item> <item> <title>Thoughts for July 30, 2005</title> <link>http://www.myfeed.com/20050730.html</link> <description>What if the earth were round not flat?</description> <author>johnkarl@nbnet.nb.ca</author> <comments>http://www.myfeed.com/comments/20050730.html</comments> <enclosure url="http://www.myfeed.com/audio/20050730.mp3" length="12216320" type="audio/mpeg" /> <guid isPermaLink="true">http://www.myfeed.com/20050730.html</guid> <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate> <source url="http://mindsaye.ca/rss/20050730.html">The Minds Aye</source> </item>
An item is required to contain at least one title, link, or description elements. Commonly, all three are included.
The getTitle
, getLink
, and getDescription
methods can be used to access the data in these elements.
//RSS doc parsed and channels iterated through already...
$currChannel =& $rssdoc->getChannel($i);
//get number of items
$numItems = $rssdoc->getItemCount();
//set up a loop to iterate through each item
for ($j = 0; $j < $numItems; $j++) {
//get reference to current item
$currItem =& $currChannel->getItem($j);
//echo title to browser
echo "title: " . $currItem->getTitle() . "\n<br />";
//echo link to browser
echo "link: " . $currItem->getLink() . "\n<br />";
//echo description to browser
echo "description: " . $currItem->getDescription() . "\n<br />\n<br />";
}
The result is:
title: Thoughts for July 29, 2005 link: http://www.myfeed.com/20050729.html description: Musings about the link between RSS, existentialism, and egg salad sandwiches. title: Thoughts for July 30, 2005 link: http://www.myfeed.com/20050730.html description: What if the earth were round not flat?
The RSS specification documents a number of additional elements such as 'author' and 'coments' that can belong to an item. The following sections detail how the data in these elements can be accessed.
The getAuthor
method of an item returns the email address of the author of the item:
//check if author element exists
if ($currItem->hasElement('author')) {
//echo author text to browser
echo $currItem->getAuthor() . "\n<br />";
}
The result is:
johnkarl@nbnet.nb.ca johnkarl@nbnet.nb.ca
The getComments
method of an item returns an URL for user comments:
//check if comments element exists
if ($currItem->hasElement('comments')) {
//echo comments URL to browser
echo $currItem->getComments() . "\n<br />";
}
The result is:
http://www.myfeed.com/comments/20050729.html http://www.myfeed.com/comments/20050730.html
The getEnclosure
method returns the a reference to the enclosure object -- media such as an mp3 file -- for the item:
//check if enclosure element exists
if ($currItem->hasElement('enclosure')) {
//get a reference to the enclosure object
$myEnclosure =& $currItem->getEnclosure();
}
Once a reference to the enclosure object has been acquired, you can use its methods -- getUrl
, getLength
, and getType
-- to extract its data.
The getUrl
method of an enclosure returns the URL of the enclosure:
//check if enclosure element exists
if ($currItem->hasElement('enclosure')) {
//get a reference to the enclosure object
$myEnclosure =& $currItem->getEnclosure();
//echo URL of enclosure to browser
echo $myEnclosure->getUrl() . "\n<br />";
}
The result is:
http://www.myfeed.com/audio/20050729.mp3 http://www.myfeed.com/audio/20050730.mp3
The getLength
method of an enclosure returns the length in bytes of the enclosure:
//check if enclosure element exists
if ($currItem->hasElement('enclosure')) {
//get a reference to the enclosure object
$myEnclosure =& $currItem->getEnclosure();
//echo length of enclosure to browser
echo $myEnclosure->getLength() . "\n<br />";
}
The result is:
12216320 12216320
The getType
method of an enclosure returns its mime type:
//check if enclosure element exists
if ($currItem->hasElement('enclosure')) {
//get a reference to the enclosure object
$myEnclosure =& $currItem->getEnclosure();
//echo mime type of enclosure to browser
echo $myEnclosure->getType() . "\n<br />";
}
The result is:
audio/mpeg audio/mpeg
The getGUID
method returns the a reference to a global unique identifier for the item:
//check if enclosure element exists
if ($currItem->hasElement('guid')) {
//get a reference to the guid object
$myGUID =& $currItem->getGUID();
}
Once a reference to the guid object has been acquired, you can use its methods -- getGUID
and isPermaLink
-- to extract its data.
The getGUID
method of a guid returns a global unique identifier, usually in the form of an URL:
//check if guid element exists
if ($currItem->hasElement('guid')) {
//get a reference to the guid object
$myEnclosure =& $currItem->getGUID();
//echo guid of guid to browser
echo $myGUID->getGUID() . "\n<br />";
}
The result is:
Thttp://www.myfeed.com/20050729.html http://www.myfeed.com/20050730.html
The isPermaLink
method of a guid returns returns true if the GUID is a permanent link to the item:
//check if guid element exists
if ($currItem->hasElement('guid')) {
//get a reference to the guid object
$myEnclosure =& $currItem->getGUID();
//output to browser if guid is permalink or not
echo ($myGUID->isPermalink() ? "Is permalink" : "Is not permalink") . "\n<br />";
}
The result is:
Is permalink Is permalink
The getPubDate
method of an item returns the date of publication:
//check if pubDate element exists
if ($currItem->hasElement('pubDate')) {
//echo pubDate to browser
echo $currItem->getPubDate() . "\n<br />";
}
The result is:
Fri, 29 Jul 05 14:15:16 GMT Sat, 30 Jul 05 13:20:35 GMT
The getSource
method returns the source feed from which the item is derived:
//check if source element exists
if ($currItem->hasElement('source')) {
//get a reference to the source object
$mySource =& $currItem->getSource();
}
Once a reference to the source object has been acquired, you can use its methods -- getSource
and getUrl
-- to extract its data.
The getSource
method of a source object returns the title of the source feed:
//check if source element exists
if ($currItem->hasElement('source')) {
//get a reference to the source object
$mySource =& $currItem->getSource();
//echo title of source to browser
echo $mySource->getSource() . "\n<br />";
}
The result is:
The Minds Aye The Minds Aye
The getUrl
method of a source object returns the URL of the source feed:
//check if source element exists
if ($currItem->hasElement('source')) {
//get a reference to the source object
$mySource =& $currItem->getSource();
//echo URL of source to browser
echo $mySource->getUrl() . "\n<br />";
}
The result is:
http://mindsaye.ca/rss/20050729.html http://mindsaye.ca/rss/20050730.html
The accessor methods that we have just reviewed are simple and convenient ways of extracting data from an RSS document.
DOMIT! RSS also provides a number of additional methods that allow you to query and interact programmatically with your RSS data.
Note: We will continue to use the sample RSS document from the previous section
For any RSS element that contains subelements -- such as channel, item, or image -- DOMIT! RSS generates a PHP array of subelement names, which is referred to as an element list.
The getElementList
method returns a reference to this element list.
If, for instance, you want to find out what elements belonged to a channel, you could do this:
//get array of element names under a channel $elementList = $currChannel->getElementList(); //echo array to browser echo "<pre>"; print_r ($elementList); echo "</pre>";
The result is:
Array ( [0] => title [1] => link [2] => description [3] => language [4] => copyright [5] => managingeditor [6] => webmaster [7] => pubdate [8] => lastbuilddate [9] => generator [10] => docs [11] => cloud [12] => ttl [13] => rating [14] => image [15] => textinput [16] => skipdays [17] => skiphours [18] => item [19] => category )
The output of the getElementList
method contains the names of each subelement of the channel.
You can use the PHP array method count
together with getElementList
to iterate through the subelements of an element:
$elementList =& $currChannel->getElementList(); $numElements = count($elementList); for ($i = 0; $i < $numElements; $i++) { //get current element name $currElementName =& $elementList[$i]; //echo name to browser echo $currElementName . "\n<br />"; }
The result is:
title link description language copyright managingeditor webmaster pubdate lastbuilddate generator docs cloud ttl rating image textinput skipdays skiphours item category
DOMIT! RSS distinguishes four basic types of RSS elements:
Simple RSS element:
A simple RSS element is:
defined by the RSS specification, and
composed of a single child text node with no attributes
For example:
<language>en-us</language>
The following elements are considered Simple RSS Elements: 'title', 'link', 'description', 'language', 'copyright', 'managingEditor', 'webmaster', 'pubDate', 'lastBuildDate', 'generator', 'docs', 'ttl', 'rating', 'lastBuildDate', 'author', 'comments', 'pubDate'.
Complex RSS Element:
A complex RSS element is:
defined by the RSS specification, and
contains child elements and/or attributes
For example:
<image> <title>Developer</title> <link>http://www.internetnews.com</link> <url>http://www.engageinteractive.com/domit/domitBanner.gif</url> <width>150</width> <height>50</height> <description>The blah blah blah de blah</description> </image>
The following elements are considered Complex RSS Elements: ''generator', 'cloud', 'image', 'textInput', 'enclosure', 'source', 'guid', 'skipDays', 'skipHours'.
Custom RSS Element:
A Custom RSS element is any element that is not defined by the RSS spec. For example:
<dc:creator>John Heinstein</dc:creator>
RSS Collection:
An RSS Collection describes multiple instances of RSS elements at the same level of hierarchy. For example:
<dc:creator>John Heinstein</dc:creator> <dc:creator>Brad Parks</dc:creator> <dc:creator>Liz Goulard</dc:creator>
Knowing the RSS type of an element allows you to programatically apply DOMIT! RSS methods specific to that type.
There are three DOMIT! RSS methods that allow you to determine type: isSimpleRSSElement
, isCustomRSSElement
, and isCollection
.
The isSimpleRSSElement
method take a single parameter -- an element name -- and returns true if the element name is a simple RSS element:
echo $currChannel->isSimpleRSSElement('language');
The above example returns true.
If you know the element is of Simple type, you can use the getElementText
method to retrieve its value. getElementText takes a single parameter -- the name of the element:
if ($currChannel->isSimpleRSSElement('language')) {
//echo value of language element to browser
echo $currChannel->getElementText('language');
}
The return value is:
en-ca
The isCustomRSSElement
method take a single parameter -- an element name -- and returns true if the element name is a custom RSS element:
echo $currChannel->isCustomRSSElement('image');
The above example returns true.
You cannot use the getElementText
method to extract data from a Complex type of RSS element. You must instead obtain a reference to that element and use the methods specific to that Complex type to extract its data
The getElement
method can be used to return an object reference to an element. getElement
takes a single parameter -- the name of the element to be retrieved:
if ($currChannel->isCustomRSSElement('dc:creator')) {
//obtain reference to the dc:creator element
$myElement =& $currChannel->getElement('dc:creator');
//echo to browser
echo $myElement->toNormalizedString(true);
}
The result, if a dc:creator node from the Dublin Core was present:
<dc:creator>John Heinstein</dc:creator>
With an object reference in hand, you can then use the DOM methods to extract the data for that object. For a dc:crfeator element, you might do this:
if ($currChannel->isCustomRSSElement('dc:creator')) { //obtain reference to the dc:creator element $myElement =& $currChannel->getElement('dc:creator'); //echo dc:creator content to browser using the DOMIT getText method echo $myElement->getText(); }
The result is:
John Heinstein
The isCollection
method take a single parameter -- an element name -- and returns true if the element name is an RSS collection:
echo $currChannel->isCollection('category');
The above example returns true.
If you have determined that an element name represents an RSS collection, then the following methods are available for extracting data from the elements of the collection.
You can also use the getElement
method to return an object reference to a collection:
if ($currChannel->isCollection('category')) {
//get object reference to collection
$myCollection =& $currChannel->getElement('category');
//process collection...
}
Once you have obtained a reference to a collection, you can use the getElementCount
method to determine the number of members of that collection.
A for loop can then be used to iterate through the members of the collection. The getElementAt
method will allow you to access the collection members by index: :
if ($currChannel->isCollection('category')) { //get object reference to collection $myCollection =& $currChannel->getElement('category'); //get number of collection members $numMembers = $myCollection->getElementCount(); //iterate through members of collection for ($i = 0; $i < $numMembers; $i++) { //get reference to each member $currMember =& $myCollection->getElementAt($i); //echo to browser echo $currMember->toNormalizedString(true); } }
The result is:
<category domain="http://www.superopendirectory.com/">philosophy/humor</category> <category domain="http://www.superopendirectory.com/">philosophy/hogwash</category>
Navigating through an RSS Document by RSS type involves:
obtaining a list of available elements
iterating through the elements in the list
determining the RSS type of each element
querying each element for data, based on the RSS type of that element
First, an element list is obtained using the getElementList
method:
//get array of element names under a channel $elementList = $currChannel->getElementList();
Secondly, a loop is constructed over the element list, and the element name is obtained at each iteration:
$elementList =& $currChannel->getElementList(); $numElements = count($elementList); for ($i = 0; $i < $numElements; $i++) { //get current element name $currElementName =& $elementList[$i]; }
Thirdly, the element is sorted into one of the four categories of RSS type, using the isSimpleRSSElement
, isCustomRSSElement
, and isCollection
methods:
$elementList =& $currChannel->getElementList(); $numElements = count($elementList); for ($i = 0; $i < $numElements; $i++) { //get current element name $currElementName =& $elementList[$i]; if ($currChannel->isSimpleRSSElement($currElementName)) { //element is a simple RSS element } else if ($currChannel->isCustomRSSElement($currElementName)) { //element is a custom RSS element } else if ($currChannel->isCollection($currElementName)) { //element is a collection of RSS elements } else { //element is a complex RSS element } }
The last step is to process elements according to the methods of their RSS type.
$elementList =& $currChannel->getElementList(); $numElements = count($elementList); for ($i = 0; $i < $numElements; $i++) { //get current element name $currElementName =& $elementList[$i]; if ($currChannel->isSimpleRSSElement($currElementName)) { //element is a simple RSS element //use getElementText to get value $myValue = $currChannel->getElementText($currElementName); } else if ($currChannel->isCustomRSSElement($currElementName)) { //element is a custom RSS element //treat as a DOM node $currElement =& $currChannel->getElement($currElementName); switch($currElementName) { case 'dc:creator': $myValue = $currElement->getText(); break; case 'cost': $myValue1 = $currElement->firstChild->nodeValue; $myValue2 = $currElement->getAttribute('currency'); break; } } else if ($currChannel->isCollection($currElementName)) { //element is a collection of RSS elements $myCollection =& $currChannel->getElement('category'); //get number of collection members $numMembers = $myCollection->getElementCount(); //iterate through members of collection for ($i = 0; $i < $numMembers; $i++) { //get reference to each member $currMember =& $myCollection->getElementAt($i); //process member of collection } } else { //element is a complex RSS element //get reference to element $element =& $currChannel->getElement($currElement); switch (strtolower($currElement)) { case DOMIT_RSS_ELEMENT_IMAGE: //process image element break; case DOMIT_RSS_ELEMENT_CLOUD: //process cloud element break; case DOMIT_RSS_ELEMENT_TEXTINPUT: //process textinput element break; case DOMIT_RSS_ELEMENT_ENCLOSURE: //process enclosure element break; case DOMIT_RSS_ELEMENT_SOURCE: //process source element break; case DOMIT_RSS_ELEMENT_GUID: //process guid element break; case DOMIT_RSS_ELEMENT_SKIPHOURS: //process skipHours element break; case DOMIT_RSS_ELEMENT_SKIPDAYS: //process skipDays element break; } }
The isRSSDefined
method is a quick way to test if a child element name is defined by the RSS specification.
It returns true if the element is defined in the RSS specification. For example:
echo ($currChannel->isRSSDefined('image') ? "Is defined." : "Is not defined.");
The result is:
Is defined.
The nodes of a DOMIT! RSS document are all available to the underlying DOM parser, DOMIT!.
To access an RSS element as a DOM node at any time, use the node
property of that element.
For example, to access an entire DOMIT! RSS document as a DOM document node:
//instantiate RSS document $rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml'); //access underlying XML document $xmldoc =& $rssdoc->node;
Some of the plans for DOMIT include:
proper handling of namespaces
Conditional-Get support
modules for non-RSS specifications such as Atom, Dublin Core
support for SSL
gzip encoding support
DOMIT! RSS has only been made possible through the suggestions, bug reports, and code submissions of others.
If you would like to contribute to DOMIT! RSS or join the DOMIT! RSS team, please email <johnkarl@nbnet.nb.ca>