The DOMIT! RSS Parser Manual


Table of Contents

1. Overview of RSS
1. Intro to RSS
2. RSS Structure
2.1. XML Declaration
2.2. rss Element
2.3. channel Element
2.4. Required Channel Elements
2.4.1. title Element
2.4.2. link Element
2.4.3. description Element
2.5. Optional Channel Elements
2.5.1. language Element
2.5.2. copyright Element
2.5.3. managingEditor Element
2.5.4. webMaster Element
2.5.5. pubDate Element
2.5.6. lastBuildDate Element
2.5.7. category Element
2.5.8. generator Element
2.5.9. docs Element
2.5.10. cloud Element
2.5.11. ttl Element
2.5.12. image Element
2.5.13. rating Element
2.5.14. textInput Element
2.5.15. skipHours Element
2.5.16. skipDays Element
2.6. Item Element
2.6.1. title Element
2.6.2. link Element
2.6.3. description Element
2.6.4. author Element
2.6.5. category Element
2.6.6. comments element
2.6.7. enclosure Element
2.6.8. guid Element
2.6.9. pubDate Element
2.6.10. source Element
2.7. Extending RSS
2. Installing DOMIT! RSS
1. What is DOMIT! RSS?
2. Installing DOMIT! RSS
3. Including the DOMIT! RSS Library in your Scripts
3. Loading a DOMIT! RSS Document
1. Instantiating and Populating a DOMIT! RSS Document
1.1. Instantiating and Parsing a DOMIT! RSS Document
1.2. loadRSS
1.3. parseRSS
1.4. Setting the Cache Location and Duration
2. Optional Settings for Loading RSS Data
2.1. useHTTPClient: Forcing DOMIT! RSS to use an HTTP Client
2.2. setRSSTimeout: Setting a timeout for obtaining feed data
2.3. setConnection: Manually specifying HTTP connection parameters
2.4. setAuthorization: Using basic HTTP authorization with your connection
2.5. setProxyConnection: Retrieving XML data through a proxy server
2.6. setProxyAuthorization: Using basic HTTP authorization with your proxy
3. Error Handling
3.1. xml_domit_rss_exception::setErrorMode
3.2. xml_domit_rss_exception::setErrorLog
3.3. xml_domit_rss_exception::setErrorHandler
4. Extracting Data from a DOMIT! RSS Document
1. Document Level Methods
1.1. parsedBy
1.2. getVersion
1.3. getRSSVersion
2. Displaying a String Representation of RSS Content
2.1. toString
2.2. toNormalizedString
3. Accessing Channels
3.1. getChannelCount
3.2. getChannel
4. Accessing the Required Elements of a Channel
4.1. getTitle
4.2. getLink
4.3. getDescription
5. Accessing the Optional Elements of a Channel
5.1. hasElement
5.2. getLanguage
5.3. getCopyright
5.4. getManagingEditor
5.5. getWebMaster
5.6. getPubDate
5.7. getLastBuildDate
5.8. getGenerator
5.9. getDocs
5.10. getCloud
5.10.1. getDomain
5.10.2. getPort
5.10.3. getPath
5.10.4. getRegisterProcedure
5.10.5. getProtocol
5.11. getTTL
5.12. getImage
5.12.1. getTitle
5.12.2. getLink
5.12.3. getUrl
5.12.4. getWidth
5.12.5. getHeight
5.12.6. getDescription
5.13. getRating
5.14. getTextInput
5.14.1. getTitle
5.14.2. getDescription
5.14.3. getName
5.14.4. getLink
5.15. getSkipDays
5.15.1. getSkipDayCount
5.15.2. getSkipDay
5.16. getSkipHours
5.16.1. getSkipHourCount
5.16.2. getSkipHour
5.17. getCategoryCount and getCategory
5.17.1. getCategory (method of Category Class)
5.17.2. getDomain
6. Accessing the Items of a Channel
6.1. getItemCount
6.2. getItem
7. Accessing the Required Elements of an Item
8. Accessing Optional Elements of an Item
8.1. getAuthor
8.2. getComments
8.3. getEnclosure
8.3.1. getUrl
8.3.2. getLength
8.3.3. getType
8.4. getGUID
8.4.1. getGUID
8.4.2. isPermaLink
8.5. getPubDate
8.6. getSource
8.6.1. getSource
8.6.2. getUrl
8.7. getCategoryCount and getCategory
5. Other Methods for Accessing RSS Data
1. getElementList
2. DOMIT! RSS and RSS Type
3. Determining RSS Type
3.1. isSimpleRSSElement and getElementText
3.2. isCustomRSSElement and getElement
3.3. isCollection
3.3.1. getElement
3.3.2. getElementCount and getElementAt
4. Navigating an RSS Document by RSS Type
4.1. Step 1: Get an Element List
4.2. Step 2: Construct a Loop Over the Element List
4.3. Step 3: Test for RSS Type
4.4. Step 4: Query Elements with Methods Specific to Their Type
5. isRSSDefined
6. Using DOM Methods with DOMIT! RSS
7. DOMIT! RSS Roadmap
8. Contributing to DOMIT! RSS

Chapter 1. Overview of RSS

1. Intro to RSS

RSS, -- variously known as Real Simple Syndication, RDF SIte Summary, or Rich Site Summary -- is an XML-based web syndication format originally developed by Netscape. It allows you to:

  • create lists of online content

  • describe the content

  • link to the content

RSS files, also know as feeds, are placed on a static URL where users can subscribe using an application called an RSS Reader or RSS Aggregator. These applications periodically query the URL for updated content and present it to the user in a readable format.

RSS is widely used by news organizations, who use it to publish daily lists of articles. Blogger articles and web site updates are also commonly summarized in RSS format.

There have been a number of versions of RSS over its lifetime. All versions, however, share a common set of core features.

2. RSS Structure

The following is a sample feed from the BBC news site, posted at the URL http://www.bbc.co.uk/syndication/feeds/news/ukfs_news/technology/rss091.xml

<?xml version="1.0" encoding="ISO-8859-1" ?>
<rss version="2.0">
    <channel>
        <title>BBC News | Technology | UK Edition</title>
        <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
        <description>Updated every minute of every day</description>
        <language>en-gb</language>
        <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
        <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
        <docs>http://www.bbc.co.uk/syndication/</docs>
        <ttl>15</ttl>
        <image>
            <title>BBC News</title>
            <url>http://news.bbc.co.uk/nol/shared/img/bbc_news_120x60.gif</url>
            <link>http://news.bbc.co.uk</link>
        </image>
        <item>
            <title>GTA sex scandal hits Australia</title>
            <description>Grand Theft Auto: San Andreas has effectively been banned in Australia because of secret sex scenes.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4728261.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4728261.stm</guid>
            <pubDate>Fri, 29 Jul 05 13:20:35 GMT</pubDate>
        </item>
        <item>
            <title>FBI holds eight on piracy charge</title>
            <description>The US authorities have charged eight people with the illegal trading of copyrighted material over the net.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4727919.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4727919.stm</guid>
            <pubDate>Fri, 29 Jul 05 12:14:24 GMT</pubDate>
        </item>
        <item>
            <title>Spacewalk to test shuttle repair</title>
            <description>Astronauts on space shuttle Discovery are getting ready to carry out the mission's first spacewalk.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/sci/tech/4730129.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/sci/tech/4730129.stm</guid>
            <pubDate>Sat, 30 Jul 05 03:26:17 GMT</pubDate>
        </item>
        <item>
            <title>Cisco curbs security researcher</title>
            <description>A security researcher has agreed never to talk about flaws in Cisco software that controls internet routers.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4727021.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4727021.stm</guid>
            <pubDate>Fri, 29 Jul 05 09:14:00 GMT</pubDate>
        </item>
        <item>
            <title>Net addresses come to Earth</title>
            <description>Net addresses are starting to reveal how they are linked to the real world.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4665351.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4665351.stm</guid>
            <pubDate>Fri, 29 Jul 05 08:07:17 GMT</pubDate>
        </item>
        <item>
            <title>Tiny customers 'won't get money'</title>
            <description>Customers who paid for undelivered orders of Tiny and Time PCs are "unlikely" to get their money back.</description>            
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/business/4727143.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/business/4727143.stm</guid>
            <pubDate>Fri, 29 Jul 05 12:04:34 GMT</pubDate>
        </item>
        <item>
            <title>Teens spurn e-mail for messaging</title>
            <description>Instant messaging, rather than e-mail, is the preferred way for US teenagers to stay in touch, research shows.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4719083.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4719083.stm</guid>
            <pubDate>Thu, 28 Jul 05 10:29:13 GMT</pubDate>
        </item>
        <item>
            <title>Fake Tube safety e-mail spreads</title>
            <description>Mobile users are warned about an e-mail which claims to have safety information about calling from the Tube.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4724101.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4724101.stm</guid>
            <pubDate>Thu, 28 Jul 05 11:26:48 GMT</pubDate>
        </item>
        <item>
            <title>Digital rights group gets going</title>
            <description>Net veterans plan to create a UK group that campaigns to protect digital rights and freedoms.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4724089.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4724089.stm</guid>
            <pubDate>Thu, 28 Jul 05 12:05:20 GMT</pubDate>
        </item>
        <item>
            <title>Hollywood hails digital film deal</title>
            <description>Movie studios reach a "milestone" deal to allow digital projectors to replace reels of film.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/film/4724335.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/entertainment/film/4724335.stm</guid>
            <pubDate>Thu, 28 Jul 05 12:40:29 GMT</pubDate>
        </item>
        <item>
            <title>Price falls push Sony into loss</title>
            <description>Sony falls into the red for the three months to June, hit by fall in prices for televisions and DVD recorders.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/business/4723567.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/business/4723567.stm</guid>
            <pubDate>Thu, 28 Jul 05 12:52:36 GMT</pubDate>
        </item>
        <item>
            <title>Profits tumble at gamer Nintendo</title>
            <description>Nintendo's first quarter profits drop as its new DS console fails to plug the gap left by waning GameCube sales. </description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/business/4724083.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/business/4724083.stm</guid>
            <pubDate>Thu, 28 Jul 05 11:09:54 GMT</pubDate>
        </item>
        <item>
           <title>Awards to applaud women in tech</title>
            <description>Top women in technology are to be recognised in the first Blackberry Women and Technology awards.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4718703.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4718703.stm</guid>
            <pubDate>Wed, 27 Jul 05 08:11:31 GMT</pubDate>
        </item>
        <item>
            <title>HP decides to stop selling iPods</title>
            <description>Hewlett-Packard announces that it is to stop selling HP-branded iPods in line with a change in strategy.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/business/4729907.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/business/4729907.stm</guid>
            <pubDate>Fri, 29 Jul 05 21:07:03 GMT</pubDate>
        </item>
        <item>
            <title>New services boost profits at BT</title>
            <description>Telecoms giant BT Group dials  up a 21% rise in quarterly pre-tax profits thanks to a "new wave" of revenues.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/business/4723343.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/business/4723343.stm</guid>
            <pubDate>Thu, 28 Jul 05 08:13:05 GMT</pubDate>
        </item>
        <item>
            <title>Napster launches radio song sales</title>
            <description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/entertainment/music/4724287.stm</guid>
            <pubDate>Thu, 28 Jul 05 11:23:13 GMT</pubDate>
        </item>
        <item>
            <title>Downloading 'myths' challenged</title>
            <description>People who illegally download music spend much more on legal downloads than average fans, a study shows.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4718249.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4718249.stm</guid>
            <pubDate>Wed, 27 Jul 05 08:10:56 GMT</pubDate>
        </item>
        <item>
            <title>Game over for Tapwave's Zodiac</title>
            <description>Catch up with the latest news from the world of video gaming.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/2207229.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/2207229.stm</guid>
            <pubDate>Fri, 29 Jul 05 16:50:25 GMT</pubDate>
        </item>
        <item>
            <title>Animated capers aim to please</title>
            <description>Reviews of two of the latest games aimed at children.</description>
            <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/4696167.stm</link>
            <guid isPermaLink="false">http://news.bbc.co.uk/1/hi/technology/4696167.stm</guid>
            <pubDate>Tue, 19 Jul 05 11:06:54 GMT</pubDate>
        </item>
    </channel>
</rss>

In the next sections we will discuss some of the main elements of an RSS feed.

2.1. XML Declaration

Since an RSS document is also an XML document, each RSS document is required to begin with an XML declaration:

<?xml version="1.0" encoding="ISO-8859-1" ?>

In common practice, this statement is often omitted.

2.2. rss Element

The root element of an RSS document is named rss. The rss element contains a single, mandatory attribute named version, which specifies the version of RSS that the document conforms to.

<rss version='0.94'>
...rss content continues here
</rss>

2.3. channel Element

An RSS document is required to contain a single channel element, which is a container for the publication data:

<rss version='2.0'>
  <channel>
    ...rss content continues here
  </channel>
</rss>

Note: Occasionally, you may see (non-standard) use of multiple channels.

2.4. Required Channel Elements

Each channel is required to include three elements: title, link, and description. They may appear in any order.

2.4.1. title Element

The title element contains a short title for the channel.

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    ...rss content continues here
  </channel>
</rss>

2.4.2. link Element

The link element contains the URL of website hosting the feed

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    ...rss content continues here
  </channel>
</rss>

2.4.3. description Element

The description element contains a description of the channel.

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    ...rss content continues here
  </channel>
</rss>

2.5. Optional Channel Elements

There are also a number of optional elements available for a channel. They may appear in any order.

2.5.1. language Element

The language element describes the language of the feed.

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    ...rss content continues here
  </channel>
</rss>

Permissible value for the language element are those defined by the W3C, or the following list:

Afrikaans: af
Albanian: sq
Basque: eu
Belarusian: be
Bulgarian: bg
Catalan: ca
Chinese (Simplified): zh-cn
Chinese (Traditional): zh-tw
Croatian: hr
Czech: cs
Danish: da
Dutch: nl
Dutch (Belgium): nl-be
Dutch (Netherlands): nl-nl
English: en
English (Australia): en-au
English (Belize): en-bz
English (Canada): en-ca
English (Ireland): en-ie
English (Jamaica): en-jm
English (New Zealand): en-nz
English (Phillipines): en-ph
English (South Africa): en-za
English (Trinidad): en-tt
English (United Kingdom): en-gb
English (United States): en-us
English (Zimbabwe): en-zw
Estonian: et
Faeroese: fo
Finnish: fi
French: fr
French (Belgium): fr-be
French (Canada): fr-ca
French (France): fr-fr
French (Luxembourg): fr-lu
French (Monaco): fr-mc
French (Switzerland): fr-ch
Galician: gl 
Gaelic: gd 
German: de 
German (Austria): de-at 
German (Germany): de-de 
German (Liechtenstein): de-li 
German (Luxembourg): de-lu 
German (Switzerland): de-ch 
Greek: el 
Hawaiian: haw 
Hungarian: hu 
Icelandic: is 
Indonesian: in 
Irish: ga 
Italian: it 
Italian (Italy): it-it 
Italian (Switzerland): it-ch 
Japanese: ja 
Korean: ko 
Macedonian: mk 
Norwegian: no 
Polish: pl 
Portuguese: pt 
Portuguese (Brazil): pt-br 
Portuguese (Portugal): pt-pt 
Romanian: ro 
Romanian (Moldova): ro-mo 
Romanian (Romania): ro-ro 
Russian: ru 
Russian (Moldova): ru-mo 
Russian (Russia): ru-ru 
Serbian: sr 
Slovak: sk 
Slovenian: sl 
Spanish: es 
Spanish (Argentina): es-ar 
Spanish (Bolivia): es-bo 
Spanish (Chile): es-cl 
Spanish (Colombia): es-co 
Spanish (Costa Rica): es-cr 
Spanish (Dominican Republic): es-do 
Spanish (Ecuador): es-ec 
Spanish (El Salvador): es-sv 
Spanish (Guatemala): es-gt 
Spanish (Honduras): es-hn 
Spanish (Mexico): es-mx 
Spanish (Nicaragua): es-ni 
Spanish (Panama): es-pa 
Spanish (Paraguay): es-py 
Spanish (Peru): es-pe 
Spanish (Puerto Rico): es-pr 
Spanish (Spain): es-es 
Spanish (Uruguay): es-uy 
Spanish (Venezuela)es-ve 
Swedish: sv 
Swedish (Finland): sv-fi 
Swedish (Sweden): sv-se 
Turkish: tr 
Ukranian: uk

2.5.2. copyright Element

The copyright element contains a copyright statement for the RSS content.

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    ...rss content continues here
  </channel>
</rss>

2.5.3. managingEditor Element

The managingEditor element contains the email address of the person responsible for editorial content.

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    ...rss content continues here
  </channel>
</rss>

2.5.4. webMaster Element

The webmaster element contains the email address of the person responsible for maintaining the channel technically.

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    <webMaster>jane.doe@bbc.co.uk</webMaster>
    ...rss content continues here
  </channel>
</rss>

2.5.5. pubDate Element

The pubDate element contains the publication date/time for the channel. This must conform to the RFC 822 Date and Time Specification (although the year may be expressed in either two or four characters).

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    <webMaster>jane.doe@bbc.co.uk</webMaster>
    <pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
    ...rss content continues here
  </channel>
</rss>

2.5.6. lastBuildDate Element

The lastBuildDate element contains the last time that the content in the channel changed. This must also conform to the RFC 822 Date and Time Specification.

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    <webMaster>jane.doe@bbc.co.uk</webMaster>
    <pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
    <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
    ...rss content continues here
  </channel>
</rss>

2.5.7. category Element

The category element contains a forward-slash separated string describing:

  • a category that the feed belongs to

  • the position of that category within a taxonomy

An optional attribute, domain, contains an URL pointing to a description of the taxonomy.

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    <webMaster>jane.doe@bbc.co.uk</webMaster>
    <pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
    <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
    <category domain="http://www.superopendirectory.com/">news/science/technology</category>
    ...rss content continues here
  </channel>
</rss>

In the above example, the feed belongs to a category named technology, which is a subcategory of news and science. A description of the taxonomy can be found on the domain http://www.superopendirectory.com/.

You can include as many categories elements as you like.

2.5.8. generator Element

The generator element contains the name of the program used to generate the RSS feed:

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    <webMaster>jane.doe@bbc.co.uk</webMaster>
    <pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
    <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
    <category domain="http://www.superopendirectory.com/">news/science/technology</category>
    <generator>RSSKY Feed Generator</generator>
    ...rss content continues here
  </channel>
</rss>

2.5.9. docs Element

The docs element contains the URL for the documentation of the RSS format of the feed:

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    <webMaster>jane.doe@bbc.co.uk</webMaster>
    <pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
    <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
    <category domain="http://www.superopendirectory.com/">news/science/technology</category>
    <generator>RSSKY Feed Generator</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    ...rss content continues here
  </channel>
</rss>

2.5.10. cloud Element

The cloud element allows users to register for a web service which will send notification when a feed has been updated.

The web service can be implemented in HTTP-POST, XML-RPC or SOAP 1.1

The cloud element contains five attributes containing parameters required for querying of the web service:

  • domain - the domain where the web service resides

  • port - the TCP port the web service is listening on

  • path - the path, relative to the domain, where the web service resides

  • registerProcedure - the name of the web service method to be called

  • protocol - the protocol of the web service, either xml-rpc, or soap

The web service returns true if the subscription is successful. By convention, registrations expire after 25 hours. Users should reregister every 24 hours for each subscription.

The following is an example of the cloud syntax:

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    <webMaster>jane.doe@bbc.co.uk</webMaster>
    <pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
    <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
    <category domain="http://www.superopendirectory.com/">news/science/technology</category>
    <generator>RSSKY Feed Generator</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
    ...rss content continues here
  </channel>
</rss>

Above, an xml-rpc query, calling the method xmlStorageSystem.rssPleaseNotify, would be sent to xml-rpc.bbc.co.uk/RPC2 on port 80. The user's application would be responsible for understanding the specific detail of how the web service method is to be called.

A good description of how the cloud interface is implemented can be found at Dave Winer's Radio Userland site at: http://backend.userland.com/publishSubscribeWalkthrough

2.5.11. ttl Element

The ttl (time to live) element describes the maximum number of minutes that a feed reader should cache the feed contents before refreshing:

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    <webMaster>jane.doe@bbc.co.uk</webMaster>
    <pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
    <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
    <category domain="http://www.superopendirectory.com/">news/science/technology</category>
    <generator>RSSKY Feed Generator</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
    <ttl>15</ttl>
    ...rss content continues here
  </channel>
</rss>

2.5.12. image Element

The image element specifies the URL of a GIF, JPEG or PNG image that can be displayed with the channel.

It contains three required elements:

  • <url> - the url of the image

  • <title> - the title of the image (is used as the value of the alt tag when the image is displayed in HTML format)

  • <link> - the URL of the channel (when displayed in HTML format, clicking on the image will navigate to this location)

The image element also contains three optional elements:

  • <width> - the width of the image

  • <height> - the height of the image

  • <description> - a description the image/channel (when displayed in HTML format, is the title of the image link)

For example:

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    <webMaster>jane.doe@bbc.co.uk</webMaster>
    <pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
    <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
    <category domain="http://www.superopendirectory.com/">news/science/technology</category>
    <generator>RSSKY Feed Generator</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
    <ttl>15</ttl>
    <image>
      <url>http://news.bbc.co.uk/go/rss/-/1/hi/technology/images/feedimage.jpg</url>
      <title>BBC Technology News</title>
      <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
      <width>200</width>
      <height>200</height>
      <description>Today's BBC technology news</description>
    </image>    
    ...rss content continues here
  </channel>
</rss>

2.5.13. rating Element

The rating element describes the PICS rating for the channel. PICS is a W3C specification for metadata with internet content. associating. It is rarely used in RSS feeds, but here is an example:

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    <webMaster>jane.doe@bbc.co.uk</webMaster>
    <pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
    <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
    <category domain="http://www.superopendirectory.com/">news/science/technology</category>
    <generator>RSSKY Feed Generator</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
    <ttl>15</ttl>
    <image>
      <url>http://news.bbc.co.uk/go/rss/-/1/hi/technology/images/feedimage.jpg</url>
      <title>BBC Technology News</title>
      <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
      <width>200</width>
      <height>200</height>
      <description>Today's BBC technology news</description>
    </image>
    <rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>
    ...rss content continues here
  </channel>
</rss>

2.5.14. textInput Element

The textInput element describes a text input box that can be associated with the channel. It's purpose is somewhat obscure, but you can use it to define items like a search or user feedback field.

The textInput element contains four required element:

  • <title> - The label of the submit button for the text input field

  • <description> - A description of the text input field

  • <name> - The name of the text object and the text input area

  • <link> - The URL of the script that is called by clicking on the submit button

Below is an example of defining a search field for the RSS channel:

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    <webMaster>jane.doe@bbc.co.uk</webMaster>
    <pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
    <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
    <category domain="http://www.superopendirectory.com/">news/science/technology</category>
    <generator>RSSKY Feed Generator</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
    <ttl>15</ttl>
    <image>
      <url>http://news.bbc.co.uk/go/rss/-/1/hi/technology/images/feedimage.jpg</url>
      <title>BBC Technology News</title>
      <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
      <width>200</width>
      <height>200</height>
      <description>Today's BBC technology news</description>
    </image>
    <rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>
    <textinput>
      <title>Search</title>
      <description>Search the BBC technology site</description>
      <name>"searchform"</name>
      <link>"http://www.google.com/search"</link>
    </textinput>
    ...rss content continues here
  </channel>
</rss>

Note: Most feed readers ignore the textInput data.

2.5.15. skipHours Element

The skipHours element contains up to 24 <hour> subelements, each marking an hour that queries of the feed can be skipped. The allowable range is between 0 and 23.

The following example instructs readers not to query the feed from between 1:00 and 3:00 AM.

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    <webMaster>jane.doe@bbc.co.uk</webMaster>
    <pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
    <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
    <category domain="http://www.superopendirectory.com/">news/science/technology</category>
    <generator>RSSKY Feed Generator</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
    <ttl>15</ttl>
    <image>
      <url>http://news.bbc.co.uk/go/rss/-/1/hi/technology/images/feedimage.jpg</url>
      <title>BBC Technology News</title>
      <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
      <width>200</width>
      <height>200</height>
      <description>Today's BBC technology news</description>
    </image>
    <rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>
    <textinput>
      <title>Search</title>
      <description>Search the BBC technology site</description>
      <name>"searchform"</name>
      <link>"http://www.google.com/search"</link>
    </textinput>
    <skipHours>
      <hour>1</hour>
      <hour>2</hour>
    </skipHours>
    ...rss content continues here
  </channel>
</rss>

2.5.16. skipDays Element

The skipDays element contains up to 7 <day> subelements, each marking a day of the week that queries of the feed can be skipped. Allowable values are: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday or Sunday.

The following example instructs readers not to query the feed on Sunday.

 <rss version='2.0'>
  <channel>
    <title>BBC News | Technology | UK Edition</title>
    <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
    <description>Updated every minute of every day</description>
    <language>en-gb</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
    <managingEditor>john.doe@bbc.co.uk</managingEditor>
    <webMaster>jane.doe@bbc.co.uk</webMaster>
    <pubDate>Sat, 30 Jul 05 09:00:00 GMT</pubDate>
    <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
    <category domain="http://www.superopendirectory.com/">news/science/technology</category>
    <generator>RSSKY Feed Generator</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <cloud domain="xml-rpc.bbc.co.uk" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />
    <ttl>15</ttl>
    <image>
      <url>http://news.bbc.co.uk/go/rss/-/1/hi/technology/images/feedimage.jpg</url>
      <title>BBC Technology News</title>
      <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
      <width>200</width>
      <height>200</height>
      <description>Today's BBC technology news</description>
    </image>
    <rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>
    <textinput>
      <title>Search</title>
      <description>Search the BBC technology site</description>
      <name>"searchform"</name>
      <link>"http://www.google.com/search"</link>
    </textinput>
    <skipHours>
      <hour>1</hour>
      <hour>2</hour>
    </skipHours>
    <skipDays>
      <day>Sunday</day>
    </skipDays>
    ...rss content continues here
  </channel>
</rss>

2.6. Item Element

A channel can contain any number of item elements. An item describes a single item of syndicated content, such as a news article or blog entry.

<rss version ='2.0'>
  <channel>
        <title>BBC News | Technology | UK Edition</title>
        <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
        <description>Updated every minute of every day</description>
        <language>en-gb</language>
        <lastBuildDate>Sat, 30 Jul 05 09:28:38 GMT</lastBuildDate>
        <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright>
        <docs>http://www.bbc.co.uk/syndication/</docs>
        <ttl>15</ttl>
        <image>
            <title>BBC News</title>
            <url>http://news.bbc.co.uk/nol/shared/img/bbc_news_120x60.gif</url>
            <link>http://news.bbc.co.uk</link>
        </image>
        <item>
          ...item content here
        </item>
        <item>
          ...item content here
        </item>
        <item>
          ...item content here
        </item>
  </channel>
</rss>

All subelements of item are optional; however, there must exist at least one title or description element must be present. In such a case, the item content can be self-contained -- that is, the entire article content included in the description tag (in entity encoded HTML, if necessary).

The following sections describe the various elements permitted in an item element.

2.6.1. title Element

The title element contains a short title for the item:

<item>
  <title>Napster launches radio song sales</title>
</item>

Note: since RSS is an XML-based format, you must ensure that illegal XML characters such as ampersands (&) are either properly escaped, or the element text is contained by a CDATA Section.

2.6.2. link Element

The link element contains the URL where the article can be found:

<item>
  <title>Napster launches radio song sales</title>
  <link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
</item>

2.6.3. description Element

The description element contains a description of the contents of the article:

<item>
  <title>Napster launches radio song sales</title>
  <link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
  <description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
</item>

2.6.4. author Element

The author element contains the email address of the author of the article:

<item>
  <title>Napster launches radio song sales</title>
  <link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
  <description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
  <author>shaunfanning@napster.com</author>
</item>

2.6.5. category Element

The category element is identical in format to the category element for a channel. Please refer to section 2.5.7.

2.6.6. comments element

The comments element contains an URL for comments pertaining to the article:

<item>
  <title>Napster launches radio song sales</title>
  <link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
  <description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
  <author>shaunfanning@napster.com</author>
  <comments>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287_comments.stm</comments>
</item>

2.6.7. enclosure Element

The enclosure element describes a media object -- such as a audio or video file -- associated with the article.

It has three attribute parameters:

  • url - the URL of the media object

  • length - the length in bytes of the media object

  • type - the mime type of the media object

To describe an mp3 file associated with the article, for instance:

<item>
  <title>Napster launches radio song sales</title>
  <link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
  <description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
  <author>shaunfanning@napster.com</author>
  <comments>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287_comments.stm</comments>
  <enclosure url="http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/fanningspeaks.mp3" length="12216320" type="audio/mpeg" />
</item>

2.6.8. guid Element

The guid element assigns a global unique identifier to the article -- an ID that differentiates it from all other articles. It generally comes in the form of an http URL.

It has one optional attribute, isPermaLink. If isPermaLink is set to true, it means that the guid is an actual HTTP URL that the user can visit to view the article. For instance:

<item>
  <title>Napster launches radio song sales</title>
  <link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
  <description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
  <author>shaunfanning@napster.com</author>
  <comments>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287_comments.stm</comments>
  <enclosure url="http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/fanningspeaks.mp3" length="12216320" type="audio/mpeg" />
  <guid isPermalink="true">http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</isPermaLink>
</item>

2.6.9. pubDate Element

The pubDate element for an item is identical in format to the pubDate element for a channel. Please see section for more information.

<item>
  <title>Napster launches radio song sales</title>
  <link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
  <description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
  <author>shaunfanning@napster.com</author>
  <comments>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287_comments.stm</comments>
  <enclosure url="http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/fanningspeaks.mp3" length="12216320" type="audio/mpeg" />
  <guid isPermalink="true">http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</isPermaLink>
  <pubDate>Wed, 27 Jul 05 08:10:56 GMT</pubDate>
</item>

2.6.10. source Element

The source element for an item indictes that the article was derived from another news feed., and the URL of that feed.

It's value is the title of the source feed. It also has a single required attribute, url, which specifies the URL of the source feed.

<item>
  <title>Napster launches radio song sales</title>
  <link>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</link>
  <description>Online music service Napster teams with satellite station XM to enable listeners to buy the music they hear.</description>
  <author>shaunfanning@napster.com</author>
  <comments>http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287_comments.stm</comments>
  <enclosure url="http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/fanningspeaks.mp3" length="12216320" type="audio/mpeg" />
  <guid isPermalink="true">http://news.bbc.co.uk/go/rss/-/1/hi/entertainment/music/4724287.stm</isPermaLink>
  <pubDate>Wed, 27 Jul 05 08:10:56 GMT</pubDate>
  <source url='http://www.wired.com/rss.xml'>Wired Online</source>
</item>

2.7. Extending RSS

RSS 2.0 adds the capability to extend the RSS specification. A RSS feed may contain non-standard elements if they are defined in a namespace.

The following example adds metadata items from the Dublin Core specification:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>BBC News | Technology | UK Edition</title>
        <link>http://news.bbc.co.uk/go/rss/-/1/hi/technology/default.stm</link>
        <description>Updated every minute of every day</description>
        <dc:language>en-us</dc:language>
        <dc:creator/>
        <dc:rights>Copyright 2004</dc:rights>
        ...

Chapter 2. Installing DOMIT! RSS

1. What is DOMIT! RSS?

There are five main flavors of the RSS specification -- 0.9, 0.91, 0.92, 1.0, and 2.0 -- and understanding the subtleties of the variations between each version can be daunting. To confuse matters even more, improperly formed RSS documents are a common occurrence.

The aim of DOMIT! RSS is to consolidate these variances under a single API, and thus allow you to programatically pull data from any RSS feed, no matter what the version, with complete consistency.

Another advantage of using DOMIT! RSS is that it piggybacks on top of the DOMIT! XML parser. You therefore have available, in addition to the DOMIT! RSS API, the standard methods and properties of the Document Object Model.

DOMIT! RSS also includes such features as a caching system that stores feeds locally, only refreshing them from the source URL at specified intervals.

DOMIT! RSS comes in two versions:

  • The main DOMIT! RSS library, which exposes the full power of the API but is weightier than you may require in some circumstances.

  • The DOMIT! RSS Lite library, which exposes only a subset of the API (that pertaining to title, link, and description elements) and is consequently lighterweight and faster.

DOMIT! RSS is written in pure PHP, so it should work identically from PHP 4.0 and up without the need to install for PHP extensions.

2. Installing DOMIT! RSS

Since DOMIT! RSS is not an extension, it requires no special setup on your web server. You will, however, need to have the following files present on your server filesystem:

  • xml_domit_rss_shared.php - shared code for DOMIT! RSS and DOMIT! RSS Lite

  • xml_domit_rss.php - the main DOMIT! RSS code

  • xml_domit_rss_lite.php - main DOMIT! RSS Lite code

  • php_text_cache.php - required if you want to render your XML as a normalized (whitespace formatted) string or if you want to use the parseXML method of DOMIT_Document.

  • php_file_utilities.php - generic file input / output utilities

  • php_http_client_generic.php - generic http client class

  • php_http_client_include.php - include file for http client class

  • php_http_connector.php - helper class for php_http_client

  • php_http_exceptions.php - http exceptions class

  • php_http_proxy.php - http proxy class

  • php_http_status_codes.php - HTTP status codes for the http proxy class

You will also need to download the latest version of the DOMIT! XML parser and install it in the same directory as DOMIT! RSS. You must use a version of DOMIT! no earlier than 1.0.

3. Including the DOMIT! RSS Library in your Scripts

To use DOMIT! RSS in your scripts, include the file xml_domit_rss.php.

require_once('somepath/xml_domit_rss.php');

To use DOMIT! RSS Lite in your scripts, include the file xml_domit_rss_lite.php.

require_once('somepath/xml_domit_rss_lite.php');

Chapter 3. Loading a DOMIT! RSS Document

1. Instantiating and Populating a DOMIT! RSS Document

In DOMIT! RSS, an RSS Document is represented by the xml_domit_rss_document (or xml_domit_rss_document_lite) class.

1.1. Instantiating and Parsing a DOMIT! RSS Document

You create an instance of the xml_domit_rss_document class in the same way as any other PHP class, using the new keyword.

The easiest was to both instantiate an RSS document and simultaneously parse it, is to pass in the URL or filename of the feed as the first parameter:

//instantiate RSS document, and parse feed at http://www.somesite.com/rss.xml
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');

A DOMIT! RSS Lite document is instantiated similarly:

//instantiate RSS Lite document, and parse feed at http://www.somesite.com/rss.xml
$rssdoc =& new xml_domit_rss_lite_document('http://www.somesite.com/rss.xml');

1.2. loadRSS

Processing an RSS feed from an URL can also be broken into two steps:

  • a DOMIT! RSS Document is first created

  • the loadRSS method is called

For example:

//instantiate RSS document
$rssdoc =& new xml_domit_rss_document();

//parse feed at http://www.somesite.com/rss.xml
$success = $rssdoc->loadRSS('http://www.somesite.com/rss.xml');

If the document is successfully parsed, loadRSS returns true.

1.3. parseRSS

The parseRSS method works in exactly the same way as loadRSS; however, it takes an RSS string as a parameter, rather than an URL.

For example:

//instantiate RSS document
$rssdoc =& new xml_domit_rss_document();

//string containing the text of an RSS feed
$myRSS = "<rss version='0.95'>\n\t
            <channel>\n\t\t
              <title>My Feed</title>\n\t\t
              <link>http://www.myfeed.com/rss.xml</link>\n\t\t
              <description>This is my silly RSS feed</description>\n\t\t
              <item>\n\t\t\t
                <title>Thoughts for July 29, 2005</title>\n\t\t\t
                <link>http://www.myfeed.com/20050729.html</link>\n\t\t\t
                <description>Musings about the link between RSS, existentialism, and egg salad sandwiches.</description>\n\t\t
              </item>\n\t
            </channel>\n
          </rss>";

//parse RSS string
$success = $rssdoc->parseRSS($myRSS);

1.4. Setting the Cache Location and Duration

DOMIT! RSS, by default, will cache a copy of your feed on the local filesystem and use this cached copy to draw its data from, rather than continually access a remote URL.

There are two parameters available for configuring the cache:

  • the cache location, which by default is set to './' (the same directory as DOMIT! RSS)

  • the cache duration, which by default is set to 3600 seconds (one hour)

If you would like to use cache values other than the defaults provided, you can pass in a new cache location and duration when instantiating a DOMIT! RSS document:

//instantiate RSS document
//also set cache directory to ../cachefiles/ and cache duration to 2 hours 
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml', '../cachefiles/', 7200);

The same can be done using the loadRSS method:

//instantiate RSS document
$rssdoc =& new xml_domit_rss_document();

//parse feed at http://www.somesite.com/rss.xml
//also set cache directory to ../cachefiles/ and cache duration to 2 hours
$success = $rssdoc->loadRSS('http://www.somesite.com/rss.xml', '../cachefiles/', 7200);

2. Optional Settings for Loading RSS Data

Sometimes the default approach to populating a DOMIT! RSS Document is insufficient. At times more flexibility is required.

By default, DOMIT! RSS uses the PHP function get_file_contents or standard PHP file input streams to retrieve the contents of an RSS feed. However, under certain consitions, both of these approaches can fail when passed a remote URL.

A number of additional options exist to deal with these possibilities.

2.1. useHTTPClient: Forcing DOMIT! RSS to use an HTTP Client

As of version 0.5, DOMIT! RSS comes bundled with the php_http_client library, written by Engage Interactive. The useHTTPClient method allows you to force DOMIT! RSS into establishing a standard HTTP connection to the web server hosting the XML file:

//instantiate DOMIT! RSS document 
$rssdoc =& new xml_domit_rss_document();

//specify that an HTTP client should be used to retrieve XML
$rssdoc->useHTTPClient(true); 

//call loadRSS method as usual
$success = $rssdoc->loadRSS("http://www.engageinteractive.com/rssfeed.xml"); 

The HTTP connection will be attempted on port 80.

2.2. setRSSTimeout: Setting a timeout for obtaining feed data

Sometimes when you attempt to obtain RSS data from a remote site, the server is slow or unavailable. Either you are unable to establish a connection, or the connection is so slow that your own site appears to be hanging.

The setRSSTimeout method allows you to set a timeout value for obtaining RSS data, beyond which the value returned by loadRSS will be false.

The following example times out after 10 seconds of unsuccessfully being able to retrieve data from the remote url:

//instantiate DOMIT! RSS document 
$rssdoc =& new xml_domit_rss_document();

//set a timeout value of 10 seconds
$rssdoc->setRSSTimeout(10); 

//call loadRSS method
$success = $rssdoc->loadRSS("http://www.engageinteractive.com/rssfeed.xml");

if ($success) {
  //process RSS
}
else {
  //no RSS to process; possibly a timeout
}

2.3. setConnection: Manually specifying HTTP connection parameters

If you need to establish an HTTP connection to retrieve your RSS data, but the useHTTPClient method does not provide enough flexibility, the setConnection method of a DOMIT! RSS document can be used to manually set the parameters of the connection.

$rssdoc =& new xml_domit_rss_document();

//establish HTTP connection on port 955
$rssdoc->setConnection('http://www.engageinteractive.com', '/', '955');

//call loadRSS method as usual
$success = $rssdoc->loadRSS("http://www.engageinteractive.com/rssfeed.xml");

In the above example, an HTTP connection will be established on port 955 of host http://www.engageinteractive.com. You can also use a raw IP address for the host, such as http://198.162.0.10

Note that you can also pass in a user name and password to the setConnection method, if you must use HTTP Authorization to establish your connection. For more about HTTP Authorization, please see the entry on the setAuthorization method.

2.4. setAuthorization: Using basic HTTP authorization with your connection

The HTTP specification allows for a basic (i.e., not particularly secure) type of authorization called HTTP Authorization. If the RSS file that you require is protected by this sort of authentication, you can use the setAuthorization method of DOMIT! RSS.

setAuthorization is used in conjunction with the setConnection method, and requires that you provide a plain text username and password:

$rssdoc =& new xml_domit_rss_document();

//establish HTTP connection on port 955
$rssdoc->setConnection('http://www.engageinteractive.com', '/', '955');

//set user name and password for authorization
$rssdoc->setAuthorization('johnheinstein', 'mypassword');

//call loadRSS method as usual
$success = $rssdoc->loadRSS("http://www.engageinteractive.com/rssfeed.xml");

2.5. setProxyConnection: Retrieving XML data through a proxy server

An HTTP proxy is a server that acts as an intermediary between an HTTP client (a user's browser) and the Internet. It is used to enforce security, administrative control, and caching services.

If you are behind a firewall, for instance, and must connect to a proxy server to access web based resources, then the setProxyConnection method will allow you to access such data.

The setProxyConnection method works inn exactly the same way as setConnection:

$rssdoc =& new xml_domit_rss_document();

//establish proxy connection at http://www.myproxyconnection.com on port 1060
$rssdoc->setProxyConnection('http://www.myproxyconnection.com', '/', '1060');

//call loadRSS method as usual
$success = $rssdoc->loadRSS("http://www.engageinteractive.com/rssfeed.xml");

2.6. setProxyAuthorization: Using basic HTTP authorization with your proxy

The setProxyAuthorization is called in exactly the same way as setAuthorization. Just provide a valid user name and password:

$rssdoc =& new xml_domit_rss_document();

//establish proxy connection at http://www.myproxyconnection.com on port 1060
$rssdoc->setProxyConnection('http://www.myproxyconnection.com', '/', '1060');

//set user name and password for authorization
$rssdoc->setProxyAuthorization('johnheinstein', 'mypassword');

//call loadRSS method as usual
$success = $rssdoc->loadRSS("http://www.engageinteractive.com/rssfeed.xml");

3. Error Handling

When an exception occurs in DOMIT! RSS -- perhaps as a result of a remote server being down or malformed XML -- you have a number of options available for displaying these errors.

3.1. xml_domit_rss_exception::setErrorMode

The xml_domit_rss_exception::setErrorMode method allows you to define the behavior of DOMIT! RSS when an exception occurs. It takes a single parameter -- an integer or interger constant representing the error mode:

  • DOMIT_RSS_ONERROR_CONTINUE (1) - specifies that DOMIT! RSS should continue processing after an exception occurs. This is the default behavior.

  • DOMIT_RSS_ONERROR_DIE (2) - specifies that DOMIT! RSS should die and display the error message after an exception occurs.

For example:

$rssdoc =& new xml_domit_rss_document();

//sets DOMIT! RSS to die on an exception
xml_domit_rss_exception::setErrorMode(DOMIT_RSS_ONERROR_DIE);

3.2. xml_domit_rss_exception::setErrorLog

The xml_domit_rss_exception::setErrorLog method allows you to specify a file to which error messages are logged and timestamped. This is a useful feature for debugging RSS feed problems.

It takes two parameters:

  • a boolean specifying whether logging should be turned on (true) or off (false)

  • a string containing the absolute or relative path of the error log file.

The following example specifies that errors are to be logged to the file 'rssErrorLog.txt':

$rssdoc =& new xml_domit_rss_document();

//specifies that error logging is to be enabled and the error log filename
xml_domit_rss_exception::setErrorLog(true, 'rssErrorLog.txt');

3.3. xml_domit_rss_exception::setErrorHandler

If you would like to set a custom error handler for DOMIT! RSS, you can use the xml_domit_rss_exception::setErrorHandler method.

It takes a single parameter -- the method to handle the error.

The custom errorhandler method must have the following method signature...

function myCustomErrorHandler($errorNum, $errorString)

...where $errorNum is an integer signifying the number of the error, and $errorString is a string giving a description of the error.

For example, if you wrote a function to handle your DOMIT! RSS errors that looked like this:

function myErrorHandler($errorNum, $errorString) {
  echo "The error number is " . $errorNum . " and " the error string is " . $errorString;
}

You could invoke it like this:

xml_domit_rss_exception::setErrorHandler("myErrorHandler");

If the myErrorHandler function was a method of a class named ErrorHandlers rather than a standalone function, you could invoke setErrorHandler like this:

xml_domit_rss_exception::setErrorHandler(array("ErrorHandlers", "myErrorHandler"));

Chapter 4. Extracting Data from a DOMIT! RSS Document

Once you have successfully loaded a DOMIT! RSS document, you are ready to begin extracting feed data.

We will use this RSS document for the following examples:

<?xml version="1.0"?>
<rss version='0.95'>
  <channel>
    <title>My Feed</title>
    <link>http://www.myfeed.com/rss.xml</link>
    <description>This is my silly RSS feed</description>
    <language>en-ca</language>
    <copyright>2005 John Heinstein</copyright>
    <managingEditor>johnkarl@nbnet.nb.ca</managingEditor>
    <webMaster>johnkarl@nbnet.nb.ca</webMaster>
    <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate>
    <lastBuildDate>Sat, 30 Jul 05 13:20:35 GMT</lastBuildDate>
    <generator>RSSky Feed Generator</generator>
    <docs>http://www.myfeed.com/rss/docs.html</docs>
    <cloud domain="www.myfeed.com" port="80" path="/rss" registerProcedure="rssSystem.rssPleaseNotify" protocol="xml-rpc" />
    <ttl>20</ttl>
    <rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>
    <image>
      <title>My Feed</title>
      <url>http://www.myfeed.com/rss/myfeed.jpg</url>
      <link>http://www.myfeed.com/</link>
      <width>100</width>
      <height>70</height>
      <description>Picture of John</description>
    </image>
    <textinput>
      <title>Search</title>
      <description>Search the My Feed site</description>
      <name>searchform</name>
      <link>http://www.google.com/search</link>
    </textinput>
    <skipDays>
      <day>Friday</day>
      <day>Saturday</day>
      <day>Sunday</day>
    </skipDays>
    <skipHours>
      <hour>16</hour>
    </skipHours>
    <category domain="http://www.superopendirectory.com/">philosophy/humor</category>
    <category domain="http://www.superopendirectory.com/">philosophy/hogwash</category>
    <item>
      <title>Thoughts for July 29, 2005</title>
      <link>http://www.myfeed.com/20050729.html</link>
      <description>Musings about the link between RSS, existentialism, and egg salad sandwiches.</description>
      <author>johnkarl@nbnet.nb.ca</author>
      <comments>http://www.myfeed.com/comments/20050729.html</comments>
      <enclosure url="http://www.myfeed.com/audio/20050729.mp3" length="12216320" type="audio/mpeg" />
      <guid isPermaLink="true">http://www.myfeed.com/20050729.html</guid>
      <pubDate>Fri, 29 Jul 05 14:15:16 GMT</pubDate>  
      <source url="http://mindsaye.ca/rss/20050729.html">The Minds Aye</source>     
    </item>
    <item>
      <title>Thoughts for July 30, 2005</title>
      <link>http://www.myfeed.com/20050730.html</link>
      <description>What if the earth were round not flat?</description>
      <author>johnkarl@nbnet.nb.ca</author>
      <comments>http://www.myfeed.com/comments/20050730.html</comments>
      <enclosure url="http://www.myfeed.com/audio/20050730.mp3" length="12216320" type="audio/mpeg" />
      <guid isPermaLink="true">http://www.myfeed.com/20050730.html</guid> 
      <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate> 
      <source url="http://mindsaye.ca/rss/20050730.html">The Minds Aye</source>      
    </item>
  </channel>
</rss>

1. Document Level Methods

There are several methods available to you for obtaining document level information: parsedBy, getVersion, and getRSSVersion.

1.1. parsedBy

To determine whether DOMIT! RSS or DOMIT! RSS Lite was used to parse your document, you can use the parsedBy method:

$rssParser = $rssdoc->parsedBy();

The parsedBy method returns a string with a value of either DOMIT_RSS or DOMIT_RSS_LITE.

1.2. getVersion

The getVersion method returns the version number of the current install of DOMIT! RSS.

$myVersion = $rssdoc->getVersion();

1.3. getRSSVersion

The getRSSVersion method returns the version of the RSS specification that the current document is structured on.

$myRSSVersion = $rssdoc->getRSSVersion();

2. Displaying a String Representation of RSS Content

A text representation of an RSS document or any of its elements can be displayed using the toString and toNormalizedString methods.

2.1. toString

We can display an unformatted string representation of an RSS document using the toString method :

//instantiate RSS document, and parse feed at http://www.somesite.com/rss.xml
require_once('xml_domit_rss.php');
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');

//echo document to browser
echo $rssdoc->toString(true);

The following string will be echoed to the browser window:

<<rss version="0.95"><channel><title>My Feed</title><link>http://www.myfeed.com/rss.xml</link><description>This is my silly RSS feed</description><language>en-ca</language><copyright>2005 John Heinstein</copyright><managingEditor>johnkarl@nbnet.nb.ca</managingEditor><webMaster>johnkarl@nbnet.nb.ca</webMaster><pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate><lastBuildDate>Sat, 30 Jul 05 13:20:35 GMT</lastBuildDate><generator>RSSky Feed Generator</generator><docs>http://www.myfeed.com/rss/docs.html</docs><cloud domain="www.myfeed.com" port="80" path="/rss" registerProcedure="rssSystem.rssPleaseNotify" protocol="xml-rpc" /><ttl>20</ttl><rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating><image><title>My Feed</title><url>http://www.myfeed.com/rss/myfeed.jpg</url><link>http://www.myfeed.com/</link><width>100</width><height>70</height><description>Picture of John</description></image><textinput><title>Search</title><description>Search the My Feed site</description><name>searchform</name><link>http://www.google.com/search</link></textinput><skipDays><day>Friday</day><day>Saturday</day><day>Sunday</day></skipDays><skipHours><hour>16</hour></skipHours><category domain="http://www.superopendirectory.com/">philosophy/humor</category><category domain="http://www.superopendirectory.com/">philosophy/hogwash</category><item><title>Thoughts for July 29, 2005</title><link>http://www.myfeed.com/20050729.html</link><description>Musings about the link between RSS, existentialism, and egg salad sandwiches.</description><author>johnkarl@nbnet.nb.ca</author><comments>http://www.myfeed.com/comments/20050729.html</comments><enclosure url="http://www.myfeed.com/audio/20050729.mp3" length="12216320" type="audio/mpeg" /><guid isPermaLink="true">http://www.myfeed.com/20050729.html</guid><pubDate>Fri, 29 Jul 05 14:15:16 GMT</pubDate><source url="http://mindsaye.ca/rss/20050729.html">The Minds Aye</source></item><item><title>Thoughts for July 30, 2005</title><link>http://www.myfeed.com/20050730.html</link><description>What if the earth were round not flat?</description><author>johnkarl@nbnet.nb.ca</author><comments>http://www.myfeed.com/comments/20050730.html</comments><enclosure url="http://www.myfeed.com/audio/20050730.mp3" length="12216320" type="audio/mpeg" /><guid isPermaLink="true">http://www.myfeed.com/20050730.html</guid><pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate><source url="http://mindsaye.ca/rss/20050730.html">The Minds Aye</source></item></channel></rss>

The first parameter of toString , if set to true, converts special HTML characters into their encoded version (i.e. & into &amp;) so that they will display properly in a browser.

If you would like unconverted raw text to be output (for instance, when echoing to a command line interface) substitute a value of false:

echo $rssdoc->toString(false);

2.2. toNormalizedString

One drawback of the toString output is that it is not particularly readable, since all text of the node is compressed into one line. The toNormalizedString method will output text that is much more nicely formatted:

//instantiate RSS document, and parse feed at http://www.somesite.com/rss.xml
require_once('xml_domit_rss.php');
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');

//echo document to browser
echo $rssdoc->toNormalizedString(true);

The following string will be echoed to the browser window:

<rss version="0.95">
    <channel>
        <title>My Feed</title>
        <link>http://www.myfeed.com/rss.xml</link>
        <description>This is my silly RSS feed</description>
        <language>en-ca</language>
        <copyright>2005 John Heinstein</copyright>
        <managingEditor>johnkarl@nbnet.nb.ca</managingEditor>
        <webMaster>johnkarl@nbnet.nb.ca</webMaster>
        <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate>
        <lastBuildDate>Sat, 30 Jul 05 13:20:35 GMT</lastBuildDate>
        <generator>RSSky Feed Generator</generator>
        <docs>http://www.myfeed.com/rss/docs.html</docs>
        <cloud domain="www.myfeed.com" port="80" path="/rss" registerProcedure="rssSystem.rssPleaseNotify" protocol="xml-rpc" />
        <ttl>20</ttl>
        <rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>
        <image>
            <title>My Feed</title>
            <url>http://www.myfeed.com/rss/myfeed.jpg</url>
            <link>http://www.myfeed.com/</link>
            <width>100</width>
            <height>70</height>
            <description>Picture of John</description>
        </image>
        <textinput>
            <title>Search</title>
            <description>Search the My Feed site</description>
            <name>searchform</name>
            <link>http://www.google.com/search</link>
        </textinput>
        <skipDays>
            <day>Friday</day>
            <day>Saturday</day>
            <day>Sunday</day>
        </skipDays>
        <skipHours>
            <hour>16</hour>
        </skipHours>
        <category domain="http://www.superopendirectory.com/">philosophy/humor</category>
        <category domain="http://www.superopendirectory.com/">philosophy/hogwash</category>
        <item>
            <title>Thoughts for July 29, 2005</title>
            <link>http://www.myfeed.com/20050729.html</link>
            <description>Musings about the link between RSS, existentialism, and egg salad sandwiches.</description>
            <author>johnkarl@nbnet.nb.ca</author>
            <comments>http://www.myfeed.com/comments/20050729.html</comments>
            <enclosure url="http://www.myfeed.com/audio/20050729.mp3" length="12216320" type="audio/mpeg" />
            <guid isPermaLink="true">http://www.myfeed.com/20050729.html</guid>
            <pubDate>Fri, 29 Jul 05 14:15:16 GMT</pubDate>
            <source url="http://mindsaye.ca/rss/20050729.html">The Minds Aye</source>
        </item>
        <item>
            <title>Thoughts for July 30, 2005</title>
            <link>http://www.myfeed.com/20050730.html</link>
            <description>What if the earth were round not flat?</description>
            <author>johnkarl@nbnet.nb.ca</author>
            <comments>http://www.myfeed.com/comments/20050730.html</comments>
            <enclosure url="http://www.myfeed.com/audio/20050730.mp3" length="12216320" type="audio/mpeg" />
            <guid isPermaLink="true">http://www.myfeed.com/20050730.html</guid>
            <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate>
            <source url="http://mindsaye.ca/rss/20050730.html">The Minds Aye</source>
        </item>
    </channel>
</rss>

As with the toString method, passing a value of false into toNormalizedString outputs text that is not formatted for HTML display.

3. Accessing Channels

Once you have instantiated and populated a DOMIT! RSS Document from an RSS feed, you are able to traverse the hierarchy of the document and access the element data. The first element that you must access is the channel element.

3.1. getChannelCount

Although officially, only a single channel is allowed in an RSS document, in common practice you will occasionally encounter more than one channel.

The getChannelCount method determines how many channels exist in an RSS document, allowing you to programmatically loop through each channel and extract information:

//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');

//get number of channels
$numChannels = $rssdoc->getChannelCount();

//echo channel count to browser
echo "Number of channels is: " . $numChannels;

//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
  //process current channel...
}

The result:

Number of channels is: 1

3.2. getChannel

Once you have determined the number of channels that exist in an RSS document, you can obtain a reference to a particular channel using the getChannel method:

getChannel takes a single parameter -- an integer specifying the index of the requested channel.

//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');

//get number of channels
$numChannels = $rssdoc->getChannelCount();

//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
  //obtain a reference to the current channel
  $currChannel =& $rssdoc->getChannel($i);
  
  //echo current channel to browser
  echo $currChannel->toNormalizedString(true);
}

The result is:

<channel>
    <title>My Feed</title>
    <link>http://www.myfeed.com/rss.xml</link>
    <description>This is my silly RSS feed</description>
    <language>en-ca</language>
    <copyright>2005 John Heinstein</copyright>
    <managingEditor>johnkarl@nbnet.nb.ca</managingEditor>
    <webMaster>johnkarl@nbnet.nb.ca</webMaster>
    <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate>
    <lastBuildDate>Sat, 30 Jul 05 13:20:35 GMT</lastBuildDate>
    <generator>RSSky Feed Generator</generator>
    <docs>http://www.myfeed.com/rss/docs.html</docs>
    <cloud domain="www.myfeed.com" port="80" path="/rss" registerProcedure="rssSystem.rssPleaseNotify" protocol="xml-rpc" />
    <ttl>20</ttl>
    <rating>(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))</rating>
    <image>
        <title>My Feed</title>
        <url>http://www.myfeed.com/rss/myfeed.jpg</url>
        <link>http://www.myfeed.com/</link>
        <width>100</width>
        <height>70</height>
        <description>Picture of John</description>
    </image>
    <textinput>
        <title>Search</title>
        <description>Search the My Feed site</description>
        <name>searchform</name>
        <link>"http://www.google.com/search"</link>
    </textinput>
    <skipDays>
        <day>Friday</day>
        <day>Saturday</day>
        <day>Sunday</day>
    </skipDays>
    <skipHours>
        <hour>16</hour>
    </skipHours>
    <category domain="http://www.superopendirectory.com/">philosophy/humor</category>
    <category domain="http://www.superopendirectory.com/">philosophy/hogwash</category>
    <item>
        <title>Thoughts for July 29, 2005</title>
        <link>http://www.myfeed.com/20050729.html</link>
        <description>Musings about the link between RSS, existentialism, and egg salad sandwiches.</description>
        <author>johnkarl@nbnet.nb.ca</author>
        <comments>http://www.myfeed.com/comments/20050729.html</comments>
        <enclosure url="http://www.myfeed.com/audio/20050729.mp3" length="12216320" type="audio/mpeg" />
        <guid isPermaLink="true">http://www.myfeed.com/20050729.html</guid>
        <pubDate>Fri, 29 Jul 05 14:15:16 GMT</pubDate>
        <source url="http://mindsaye.ca/rss/20050729.html">The Minds Aye</source>
    </item>
    <item>
        <title>Thoughts for July 30, 2005</title>
        <link>http://www.myfeed.com/20050730.html</link>
        <description>What if the earth were round not flat?</description>
        <author>johnkarl@nbnet.nb.ca</author>
        <comments>http://www.myfeed.com/comments/20050730.html</comments>
        <enclosure url="http://www.myfeed.com/audio/20050730.mp3" length="12216320" type="audio/mpeg" />
        <guid isPermaLink="true">http://www.myfeed.com/20050730.html</guid>
        <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate>
        <source url="http://mindsaye.ca/rss/20050730.html">The Minds Aye</source>
    </item>
</channel>

4. Accessing the Required Elements of a Channel

A channel is required, at minimum, to contain title, link, and description elements. The getTitle, getLink, and getDescription methods can be used to access the data in these elements.

4.1. getTitle

The getTitle method will return the title of a channel:

//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');

//get number of channels
$numChannels = $rssdoc->getChannelCount();

//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
  //obtain a reference to the current channel
  $currChannel =& $rssdoc->getChannel($i);
  
  //echo title of channel
  echo $currChannel->getTitle();
}

The result is:

My Feed

4.2. getLink

The getLink method will return the link of a channel:

//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');

//get number of channels
$numChannels = $rssdoc->getChannelCount();

//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
  //obtain a reference to the current channel
  $currChannel =& $rssdoc->getChannel($i);
  
  //echo link of channel
  echo $currChannel->getLink();
}

The result is:

http://www.myfeed.com/rss.xml

4.3. getDescription

The getDescription method will return a description of a channel:

//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');

//get number of channels
$numChannels = $rssdoc->getChannelCount();

//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
  //obtain a reference to the current channel
  $currChannel =& $rssdoc->getChannel($i);
  
  //echo description of channel
  echo $currChannel->getDescription();
}

The result is:

This is my silly RSS feed

5. Accessing the Optional Elements of a Channel

The RSS specification documents a number of additional elements such as 'language' and 'copyright' that can belong to a channel. The following sections detail how the data in these elements can be accessed.

5.1. hasElement

You are often be certain whether a nonrequired element is present in any particular RSS feed.

The hasElement method allows you to test for the existence of a named element. hasElement takes a single parameter -- the name of the element whose existence you are testing for.

If , for instance, you want to determine if the element copyright belonged to a channel, you could do this:

$doesCopyrightExist = $currentChannel->hasElement('copyright');

If the copyright element is found, true is returned.

5.2. getLanguage

The getLanguage method returns the language of a channel.

//check if language element exists
if ($currChannel->hasElement('language')) {

  //echo language to browser
  echo $currChannel->getLanguage();
}

The result is:

en-ca

5.3. getCopyright

The getCopyright method returns the copyright statement of a channel.

//check if copyright element exists
if ($currChannel->hasElement('copyright')) {

  //echo copyright to browser
  echo $currChannel->getCopyright();
}

The result is:

2005 John Heinstein

5.4. getManagingEditor

The getManagingEditor method returns the email address of the managing editor of a channel.

//check if managing editor element exists
if ($currChannel->hasElement('managingEditor')) {

  //echo managing editor to browser
  echo $currChannel->getManagingEditor();
}

The result is:

johnkarl@nbnet.nb.ca

5.5. getWebMaster

The getLanguage method returns the email address of the webmaster of a channel.

//check if webmaster element exists
if ($currChannel->hasElement('webMaster')) {

  //echo webmaster to browser
  echo $currChannel->getWebMaster();
}

The result is:

johnkarl@nbnet.nb.ca

5.6. getPubDate

The getPubDate method returns the language of a channel.

//check if pubDate element exists
if ($currChannel->hasElement('pubDate')) {

  //echo pubDate to browser
  echo $currChannel->getPubDate();
}

The result is:

Sat, 30 Jul 05 13:20:35 GMT

5.7. getLastBuildDate

The getLastBuildDate method returns the last build date of a channel.

//check if lastBuildDate element exists
if ($currChannel->hasElement('lastBuildDate')) {

  //echo lastBuildDate to browser
  echo $currChannel->getLastBuildDate();
}

The result is:

Sat, 30 Jul 05 13:20:35 GMT

5.8. getGenerator

The getGenerator method returns the name of the program which generated the RSS of a channel.

//check if generator element exists
if ($currChannel->hasElement('generator')) {

  //echo generator to browser
  echo $currChannel->getGenerator();
}

The result is:

RSSky Feed Generator

5.9. getDocs

The getDocs method returns the URL at which to find the docs for the channel.

//check if docs element exists
if ($currChannel->hasElement('docs')) {

  //echo docs to browser
  echo $currChannel->getDocs();
}

The result is:

http://www.myfeed.com/rss/docs.html

5.10. getCloud

The getCloud method returns the a reference to a web service for the channel which notifies the user when changes to the channel have been made.

//check if cloud element exists
if ($currChannel->hasElement('cloud')) {

  //get a reference to the cloud
  $myCloud =& $currChannel->getCloud();
}

Once a reference to the cloud object has been acquired, you can use the methods of the cloud -- getDomain, getPort, getPath, getRegisterProcedure, and getProtocol -- to extract its data:

5.10.1. getDomain

The getDomain method of a cloud allows you to retrieve its domain:

//check if cloud element exists
if ($currChannel->hasElement('cloud')) {

  //get a reference to the cloud
  $myCloud =& $currChannel->getCloud();

  //echo domain of the cloud
  echo $myCloud->getDomain();
}

The result is:

www.myfeed.com

5.10.2. getPort

The getPort method of a cloud allows you to retrieve its port:

//check if cloud element exists
if ($currChannel->hasElement('cloud')) {

  //get a reference to the cloud
  $myCloud =& $currChannel->getCloud();

  //echo port of the cloud
  echo $myCloud->getPort();
}

The result is:

80

5.10.3. getPath

The getPath method of a cloud allows you to retrieve its path:

//check if cloud element exists
if ($currChannel->hasElement('cloud')) {

  //get a reference to the cloud
  $myCloud =& $currChannel->getCloud();

  //echo path of the cloud
  echo $myCloud->getPath();
}

The result is:

/rss

5.10.4. getRegisterProcedure

The getRegisterProcedure method of a cloud allows you to retrieve its procedure:

//check if cloud element exists
if ($currChannel->hasElement('cloud')) {

  //get a reference to the cloud
  $myCloud =& $currChannel->getCloud();

  //echo register procedure of the cloud
  echo $myCloud->getRegisterProcedure();
}

The result is:

rssSystem.rssPleaseNotify

5.10.5. getProtocol

The getProtocol method of a cloud allows you to retrieve its protocol:

//check if cloud element exists
if ($currChannel->hasElement('cloud')) {

  //get a reference to the cloud
  $myCloud =& $currChannel->getCloud();

  //echo protocol of the cloud
  echo $myCloud->getProtocol();
}

The result is:

xml-rpc

5.11. getTTL

The getTTL method returns the time to live of a channel.

//check if ttl element exists
if ($currChannel->hasElement('ttl')) {

  //echo ttl to browser
  echo $currChannel->getTTL();
}

The result is:

20

5.12. getImage

The getImage method returns the a reference to the image for the channel:

//check if image element exists
if ($currChannel->hasElement('image')) {

  //get a reference to the image
  $myImage =& $currChannel->getImage();
}

Once a reference to the image object has been acquired, you can use the methods of the image -- getTitle, getLink, getUrl, getWidth, getHeight, and getDescription -- to extract its data:

5.12.1. getTitle

The getTitle method of an image allows you to retrieve its title:

//check if image element exists
if ($currChannel->hasElement('image')) {

  //get a reference to the image
  $myImage =& $currChannel->getImage();

  //echo title of the image
  echo $myImage->getTitle();
}

The result is:

My Feed

5.12.2. getLink

The getLink method of an image allows you to retrieve the link representing the channel:

//check if image element exists
if ($currChannel->hasElement('image')) {

  //get a reference to the image
  $myImage =& $currChannel->getImage();

  //echo link of the image
  echo $myImage->getLink();
}

The result is:

http://www.myfeed.com/

5.12.3. getUrl

The getUrl method of an image allows you to retrieve the URL of the image:

//check if image element exists
if ($currChannel->hasElement('image')) {

  //get a reference to the image
  $myImage =& $currChannel->getImage();

  //echo URL of the image
  $echo $myImage->getUrl();
}

The result is:

http://www.myfeed.com/rss/myfeed.jpg

5.12.4. getWidth

The getWidth method of an image allows you to retrieve the width of the image:

//check if image element exists
if ($currChannel->hasElement('image')) {

  //get a reference to the image
  $myImage =& $currChannel->getImage();

  //echo width of the image
  echo $myImage->getWidth();
}

The result is:

100

Note: The maximum width of an image is 144px; the default width is 88.

5.12.5. getHeight

The getHeight method of an image allows you to retrieve the height of the image:

//check if image element exists
if ($currChannel->hasElement('image')) {

  //get a reference to the image
  $myImage =& $currChannel->getImage();

  //echo height of the image
  echo $myImage->getHeight();
}

The result is:

70

Note: The maximum height of an image is 400px; the default height is 31.

5.12.6. getDescription

The getDescription method of an image allows you to retrieve a description of the image:

//check if image element exists
if ($currChannel->hasElement('image')) {

  //get a reference to the image
  $myImage =& $currChannel->getImage();

  //echo description of the image
  echo $myCloud->getDecription();
}

The result is:

Picture of John

5.13. getRating

The getRating method returns the PICS rating of a channel.

//check if rating element exists
if ($currChannel->hasElement('rating')) {

  //echo rating to browser
  echo $currChannel->getRating()
}

The result is:

(PICS-1.1 "http://www.classify.org/safesurf/" l r (SS~~000 1))

5.14. getTextInput

The getTextInput method returns the a reference to the text input for the channel:

//check if image element exists
if ($currChannel->hasElement('textInput')) {

  //get a reference to the text input
  $myImage =& $currChannel->getTextInput();
}

Once a reference to the text input object has been acquired, you can use its methods -- getTitle, getDescription, getName, and getLink -- to extract its data:

5.14.1. getTitle

The getTitle method of a text input allows you to retrieve its title:

//check if text input element exists
if ($currChannel->hasElement('textInput')) {

  //get a reference to the text input
  $myTextInput =& $currChannel->getTextInput();

  //get title of the text input
  $myTitle = $myTextInput->getTitle();
}

The result is:

Search

5.14.2. getDescription

The getDescription method of a text input allows you to retrieve its description:

//check if text input element exists
if ($currChannel->hasElement('textInput')) {

  //get a reference to the text input
  $myTextInput =& $currChannel->getTextInput();

  //get title of the text input
  $myTitle = $myTextInput->getTitle();

  //get description of the text input
  $myDescription = $myTextInput->getDescription();
}

The result is:

Search the My Feed site

5.14.3. getName

The getName method of a text input allows you to retrieve the name of its Submit button:

//check if text input element exists
if ($currChannel->hasElement('textInput')) {

  //get a reference to the text input
  $myTextInput =& $currChannel->getTextInput();

  //get title of the text input
  $myTitle = $myTextInput->getTitle();

  //get description of the text input
  $myDescription = $myTextInput->getDescription();

  //get name of the text input
  $myName = $myTextInput->getName();
}

The result is:

searchform

5.14.4. getLink

The getLink method of a text input allows you to retrieve the URL of the script that is called when the Submit button is clicked:

//check if text input element exists
if ($currChannel->hasElement('textInput')) {

  //get a reference to the text input
  $myTextInput =& $currChannel->getTextInput();

  //get title of the text input
  $myTitle = $myTextInput->getTitle();

  //get description of the text input
  $myDescription = $myTextInput->getDescription();

  //get name of the text input
  $myName = $myTextInput->getName();

  //get link of the text input
  $myLink = $myTextInput->getLink();
}

The result is:

http://www.google.com/search

5.15. getSkipDays

The getSkipDays method returns the a reference to the skipDays object for the channel:

//check if skipDays element exists
if ($currChannel->hasElement('skipDays')) {

  //get a reference to the skipDays object
  $mySkipDays =& $currChannel->getSkipDays();
}

Once a reference to the skipDays object has been acquired, you can use its methods -- getSkipDayCount, and getSkipDay -- to extract its data.

5.15.1. getSkipDayCount

The getSkipDayCount method of skipDays returns the number of child day elements:

//check if skipDays element exists
if ($currChannel->hasElement('skipDays')) {

  //get a reference to the skipDays object
  $mySkipDays =& $currChannel->getSkipDays();

  //get number of child day elements
  $numDays = $mySkipDays->getSkipDayCount();

  //echo number of days to browser
  echo $numDays;

  //set up loop to iterate through days
  for ($i = 0; $i < $numDays; $i++) {
    //process each day element
  }
}

The result is:

3

5.15.2. getSkipDay

The getSkipDay method of skipDays returns the value of the day element at the specified index. It takes a single parameter -- an integer specifying the index of the day element whose data you wish to access:

//check if skipDays element exists
if ($currChannel->hasElement('skipDays')) {

  //get a reference to the skipDays object
  $mySkipDays =& $currChannel->getSkipDays();

  //get number of child day elements
  $numDays = $mySkipDays->getSkipDayCount();

  //set up loop to iterate through days
  for ($i = 0; $i < $numDays; $i++) {

    //echo day item to browser
    echo $mySkipDays->getSkipDay($i) . "\n<br />";
  }
}

The result is:

Friday
Saturday
Sunday 

5.16. getSkipHours

The getSkipHours method returns the a reference to the skipHours object for the channel:

//check if skipHours element exists
if ($currChannel->hasElement('skipHours')) {

  //get a reference to the skipHours object
  $mySkipHours =& $currChannel->getSkipHours();
}

Once a reference to the skipHours object has been acquired, you can use its methods -- getSkipHourCount, and getSkipHour -- to extract its data.

5.16.1. getSkipHourCount

The getSkipHourCount method of skipHours returns the number of child hour elements:

//check if skipHours element exists
if ($currChannel->hasElement('skipHours')) {

  //get a reference to the skipHours object
  $mySkipHours =& $currChannel->getSkipHours();

  //get number of child hour elements
  $numHours = $mySkipHours->getSkipHourCount();

  //echo num hours to browser
  echo $numHours;

  //set up loop to iterate through hours
  for ($i = 0; $i < $numHours; $i++) {
    //process each hour element
  }
}

The result is:

1

5.16.2. getSkipHour

The getSkipHour method of skipHours returns the value of the hour element at the specified index. It takes a single parameter -- an integer specifying the index of the hour element whose data you wish to access:

//check if skipHours element exists
if ($currChannel->hasElement('skipHours')) {

  //get a reference to the skipHours object
  $mySkipHours =& $currChannel->getSkipHours();

  //get number of child hour elements
  $numHours = $mySkipHours->getSkipHourCount();

  //set up loop to iterate through hours
  for ($i = 0; $i < $numHours; $i++) {

    //echo day item to browser
    echo "day: " . $mySkipHours->getSkipHour($i) . "\n<br />";
  }
}

The result is:

16 

5.17. getCategoryCount and getCategory

A channel can have multiple category elements. The getCategoryCount method indicates how many exist in the current channel:

//get number of categories
$numCategories =& $currChannel->getCategoryCount();

//set up loop to iterate through categories
for ($j=0; $j < $numCategories; $j++) {
  //process categories
}

Once you have determined the number of categories and set up a loop to iterate through each one, you can use the getCategory method to retrieve individual cateogry elements:

//get number of categories
$numCategories =& $currChannel->getCategoryCount();

//set up loop to iterate through categories
for ($j=0; $j < $numCategories; $j++) {

  //get current category
  $currCategory =& $currChannel->getCategory($j);

  //echo to browser
  echo $currCategory->toNormalizedString(true);
}

The result is:

<category domain="http://www.superopendirectory.com/">philosophy/humor</category>

<category domain="http://www.superopendirectory.com/">philosophy/hogwash</category>

A category has two methods at its disposal: getCategory and getDomain.

5.17.1. getCategory (method of Category Class)

The getCategory method of a category returns the text of the category:

//get number of categories
$numCategories =& $currChannel->getCategoryCount();

//set up loop to iterate through categories
for ($j=0; $j < $numCategories; $j++) {

  //get current category
  $currCategory =& $currChannel->getCategory($j);

  //echo category text to browser
  echo $currCategory->getCategory() . "\n<br />";
}

The result is:

philosophy/humor
philosophy/hogwash

5.17.2. getDomain

The getDomain method of a category returns the domain attribute of the category, or an empty string if one does not exist:

//get number of categories
$numCategories =& $currChannel->getCategoryCount();

//set up loop to iterate through categories
for ($j=0; $j < $numCategories; $j++) {

  //get current category
  $currCategory =& $currChannel->getCategory($j);

  //echo domain to browser
  echo $currCategory->getDomain() . "\n<br />";
}

The result is:

http://www.superopendirectory.com/
http://www.superopendirectory.com/ 

6. Accessing the Items of a Channel

With a reference to a channel in hand, you are able to loop through the items of that channel and extract the item data. The process is almost identical to looping through the channels of an RSS document.

6.1. getItemCount

The getItemCount method determines how many items exist in a channel, allowing you to programmatically loop through each item and extract information:

//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');

//get number of channels
$numChannels = $rssdoc->getChannelCount();

//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
  //obtain a reference to the current channel
  $currChannel =& $rssdoc->getChannel($i);
  
  //get number of items
  $numItems = $currChannel->getItemCount();

  //set up a loop to iterate through each item
  for ($j = 0; $j < $numItems; $j++) {
    //process item data
  }
}

6.2. getItem

Once you have determined the number of items that exist in a channel, you can obtain a reference to a particular item using the getItem method:

getitem takes a single parameter -- an integer specifying the index of the requested item.

//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');

//get number of channels
$numChannels = $rssdoc->getChannelCount();

//set up a loop to iterate through each channel
for ($i = 0; $i < $numChannels; $i++) {
  //obtain a reference to the current channel
  $currChannel =& $rssdoc->getChannel($i);
  
  //get number of items
  $numItems = $currChannel->getItemCount();

  //set up a loop to iterate through each item
  for ($j = 0; $j < $numItems; $j++) {
    //get reference to current item`
    $currItem =& $currChannel->getItem($j);

    //echo to browser
    echo $currItem->toNormalizedString(true);
  }
}

The result is:

<item>
    <title>Thoughts for July 29, 2005</title>
    <link>http://www.myfeed.com/20050729.html</link>
    <description>Musings about the link between RSS, existentialism, and egg salad sandwiches.</description>
    <author>johnkarl@nbnet.nb.ca</author>
    <comments>http://www.myfeed.com/comments/20050729.html</comments>
    <enclosure url="http://www.myfeed.com/audio/20050729.mp3" length="12216320" type="audio/mpeg" />
    <guid isPermaLink="true">http://www.myfeed.com/20050729.html</guid>
    <pubDate>Fri, 29 Jul 05 14:15:16 GMT</pubDate>
    <source url="http://mindsaye.ca/rss/20050729.html">The Minds Aye</source>
</item>

<item>
    <title>Thoughts for July 30, 2005</title>
    <link>http://www.myfeed.com/20050730.html</link>
    <description>What if the earth were round not flat?</description>
    <author>johnkarl@nbnet.nb.ca</author>
    <comments>http://www.myfeed.com/comments/20050730.html</comments>
    <enclosure url="http://www.myfeed.com/audio/20050730.mp3" length="12216320" type="audio/mpeg" />
    <guid isPermaLink="true">http://www.myfeed.com/20050730.html</guid>
    <pubDate>Sat, 30 Jul 05 13:20:35 GMT</pubDate>
    <source url="http://mindsaye.ca/rss/20050730.html">The Minds Aye</source>
</item>

7. Accessing the Required Elements of an Item

An item is required to contain at least one title, link, or description elements. Commonly, all three are included.

The getTitle, getLink, and getDescription methods can be used to access the data in these elements.

//RSS doc parsed and channels iterated through already...
$currChannel =& $rssdoc->getChannel($i);
  
//get number of items
$numItems = $rssdoc->getItemCount();

//set up a loop to iterate through each item
for ($j = 0; $j < $numItems; $j++) {
  //get reference to current item
  $currItem =& $currChannel->getItem($j);

  //echo title to browser
  echo "title: " . $currItem->getTitle() . "\n<br />";

  //echo link to browser
  echo "link: " . $currItem->getLink() . "\n<br />";

  //echo description to browser
  echo "description: " . $currItem->getDescription() . "\n<br />\n<br />";
}

The result is:

title: Thoughts for July 29, 2005
link: http://www.myfeed.com/20050729.html
description: Musings about the link between RSS, existentialism, and egg salad sandwiches.

title: Thoughts for July 30, 2005
link: http://www.myfeed.com/20050730.html
description: What if the earth were round not flat? 

8. Accessing Optional Elements of an Item

The RSS specification documents a number of additional elements such as 'author' and 'coments' that can belong to an item. The following sections detail how the data in these elements can be accessed.

8.1. getAuthor

The getAuthor method of an item returns the email address of the author of the item:

//check if author element exists
if ($currItem->hasElement('author')) {

  //echo author text to browser
  echo $currItem->getAuthor() . "\n<br />";
}

The result is:

johnkarl@nbnet.nb.ca
johnkarl@nbnet.nb.ca 

8.2. getComments

The getComments method of an item returns an URL for user comments:

//check if comments element exists
if ($currItem->hasElement('comments')) {

  //echo comments URL to browser
  echo $currItem->getComments() . "\n<br />";
}

The result is:

http://www.myfeed.com/comments/20050729.html
http://www.myfeed.com/comments/20050730.html 

8.3. getEnclosure

The getEnclosure method returns the a reference to the enclosure object -- media such as an mp3 file -- for the item:

//check if enclosure element exists
if ($currItem->hasElement('enclosure')) {

  //get a reference to the enclosure object
  $myEnclosure =& $currItem->getEnclosure();
}

Once a reference to the enclosure object has been acquired, you can use its methods -- getUrl, getLength, and getType -- to extract its data.

8.3.1. getUrl

The getUrl method of an enclosure returns the URL of the enclosure:

//check if enclosure element exists
if ($currItem->hasElement('enclosure')) {

  //get a reference to the enclosure object
  $myEnclosure =& $currItem->getEnclosure();

  //echo URL of enclosure to browser
  echo $myEnclosure->getUrl() . "\n<br />";
}

The result is:

http://www.myfeed.com/audio/20050729.mp3
http://www.myfeed.com/audio/20050730.mp3 

8.3.2. getLength

The getLength method of an enclosure returns the length in bytes of the enclosure:

//check if enclosure element exists
if ($currItem->hasElement('enclosure')) {

  //get a reference to the enclosure object
  $myEnclosure =& $currItem->getEnclosure();

  //echo length of enclosure to browser
  echo $myEnclosure->getLength() . "\n<br />";
}

The result is:

12216320
12216320 

8.3.3. getType

The getType method of an enclosure returns its mime type:

//check if enclosure element exists
if ($currItem->hasElement('enclosure')) {

  //get a reference to the enclosure object
  $myEnclosure =& $currItem->getEnclosure();

  //echo mime type of enclosure to browser
  echo $myEnclosure->getType() . "\n<br />";
}

The result is:

audio/mpeg
audio/mpeg 

8.4. getGUID

The getGUID method returns the a reference to a global unique identifier for the item:

//check if enclosure element exists
if ($currItem->hasElement('guid')) {

  //get a reference to the guid object
  $myGUID =& $currItem->getGUID();
}

Once a reference to the guid object has been acquired, you can use its methods -- getGUID and isPermaLink -- to extract its data.

8.4.1. getGUID

The getGUID method of a guid returns a global unique identifier, usually in the form of an URL:

//check if guid element exists
if ($currItem->hasElement('guid')) {

  //get a reference to the guid object
  $myEnclosure =& $currItem->getGUID();

  //echo guid of guid to browser
  echo $myGUID->getGUID() . "\n<br />";
}

The result is:

Thttp://www.myfeed.com/20050729.html
http://www.myfeed.com/20050730.html 

8.4.2. isPermaLink

The isPermaLink method of a guid returns returns true if the GUID is a permanent link to the item:

//check if guid element exists
if ($currItem->hasElement('guid')) {

  //get a reference to the guid object
  $myEnclosure =& $currItem->getGUID();

  //output to browser if guid is permalink or not
  echo ($myGUID->isPermalink() ? "Is permalink" : "Is not permalink") . "\n<br />";
}

The result is:

Is permalink
Is permalink 

8.5. getPubDate

The getPubDate method of an item returns the date of publication:

//check if pubDate element exists
if ($currItem->hasElement('pubDate')) {

  //echo pubDate to browser
  echo $currItem->getPubDate() . "\n<br />";
}

The result is:

Fri, 29 Jul 05 14:15:16 GMT
Sat, 30 Jul 05 13:20:35 GMT

8.6. getSource

The getSource method returns the source feed from which the item is derived:

//check if source element exists
if ($currItem->hasElement('source')) {

  //get a reference to the source object
  $mySource =& $currItem->getSource();
}

Once a reference to the source object has been acquired, you can use its methods -- getSource and getUrl -- to extract its data.

8.6.1. getSource

The getSource method of a source object returns the title of the source feed:

//check if source element exists
if ($currItem->hasElement('source')) {

  //get a reference to the source object
  $mySource =& $currItem->getSource();

  //echo title of source to browser
  echo $mySource->getSource() . "\n<br />";
}

The result is:

The Minds Aye
The Minds Aye 

8.6.2. getUrl

The getUrl method of a source object returns the URL of the source feed:

//check if source element exists
if ($currItem->hasElement('source')) {

  //get a reference to the source object
  $mySource =& $currItem->getSource();

  //echo URL of source to browser
  echo $mySource->getUrl() . "\n<br />";
}

The result is:

http://mindsaye.ca/rss/20050729.html
http://mindsaye.ca/rss/20050730.html 

8.7. getCategoryCount and getCategory

The getCategoryCount and getCategory methods of an item are identical to those of the channel element. Please see section 5.17 for more information.

Chapter 5. Other Methods for Accessing RSS Data

The accessor methods that we have just reviewed are simple and convenient ways of extracting data from an RSS document.

DOMIT! RSS also provides a number of additional methods that allow you to query and interact programmatically with your RSS data.

Note: We will continue to use the sample RSS document from the previous section

1. getElementList

For any RSS element that contains subelements -- such as channel, item, or image -- DOMIT! RSS generates a PHP array of subelement names, which is referred to as an element list.

The getElementList method returns a reference to this element list.

If, for instance, you want to find out what elements belonged to a channel, you could do this:

//get array of element names under a channel
$elementList = $currChannel->getElementList();

//echo array to browser
echo "<pre>";
print_r ($elementList);
echo "</pre>";

The result is:

Array
(
    [0] => title
    [1] => link
    [2] => description
    [3] => language
    [4] => copyright
    [5] => managingeditor
    [6] => webmaster
    [7] => pubdate
    [8] => lastbuilddate
    [9] => generator
    [10] => docs
    [11] => cloud
    [12] => ttl
    [13] => rating
    [14] => image
    [15] => textinput
    [16] => skipdays
    [17] => skiphours
    [18] => item
    [19] => category
)

The output of the getElementList method contains the names of each subelement of the channel.

You can use the PHP array method count together with getElementList to iterate through the subelements of an element:

$elementList =& $currChannel->getElementList();
$numElements = count($elementList);

for ($i = 0; $i < $numElements; $i++) {

  //get current element name
  $currElementName =& $elementList[$i];
  
  //echo name to browser
  echo $currElementName . "\n<br />";
}

The result is:

title
link
description
language
copyright
managingeditor
webmaster
pubdate
lastbuilddate
generator
docs
cloud
ttl
rating
image
textinput
skipdays
skiphours
item
category

2. DOMIT! RSS and RSS Type

DOMIT! RSS distinguishes four basic types of RSS elements:

Simple RSS element:

A simple RSS element is:

  • defined by the RSS specification, and

  • composed of a single child text node with no attributes

For example:

<language>en-us</language>

The following elements are considered Simple RSS Elements: 'title', 'link', 'description', 'language', 'copyright', 'managingEditor', 'webmaster', 'pubDate', 'lastBuildDate', 'generator', 'docs', 'ttl', 'rating', 'lastBuildDate', 'author', 'comments', 'pubDate'.

Complex RSS Element:

A complex RSS element is:

  • defined by the RSS specification, and

  • contains child elements and/or attributes

For example:

<image>
  <title>Developer</title>
  <link>http://www.internetnews.com</link>
  <url>http://www.engageinteractive.com/domit/domitBanner.gif</url>
  <width>150</width>
  <height>50</height>
  <description>The blah blah blah de blah</description>
</image>

The following elements are considered Complex RSS Elements: ''generator', 'cloud', 'image', 'textInput', 'enclosure', 'source', 'guid', 'skipDays', 'skipHours'.

Custom RSS Element:

A Custom RSS element is any element that is not defined by the RSS spec. For example:

<dc:creator>John Heinstein</dc:creator>

RSS Collection:

An RSS Collection describes multiple instances of RSS elements at the same level of hierarchy. For example:

<dc:creator>John Heinstein</dc:creator>
<dc:creator>Brad Parks</dc:creator>
<dc:creator>Liz Goulard</dc:creator>

3. Determining RSS Type

Knowing the RSS type of an element allows you to programatically apply DOMIT! RSS methods specific to that type.

There are three DOMIT! RSS methods that allow you to determine type: isSimpleRSSElement, isCustomRSSElement, and isCollection.

3.1. isSimpleRSSElement and getElementText

The isSimpleRSSElement method take a single parameter -- an element name -- and returns true if the element name is a simple RSS element:

echo $currChannel->isSimpleRSSElement('language');

The above example returns true.

If you know the element is of Simple type, you can use the getElementText method to retrieve its value. getElementText takes a single parameter -- the name of the element:

if ($currChannel->isSimpleRSSElement('language')) {

  //echo value of language element to browser
  echo $currChannel->getElementText('language');
}

The return value is:

en-ca

3.2. isCustomRSSElement and getElement

The isCustomRSSElement method take a single parameter -- an element name -- and returns true if the element name is a custom RSS element:

echo $currChannel->isCustomRSSElement('image');

The above example returns true.

You cannot use the getElementText method to extract data from a Complex type of RSS element. You must instead obtain a reference to that element and use the methods specific to that Complex type to extract its data

The getElement method can be used to return an object reference to an element. getElement takes a single parameter -- the name of the element to be retrieved:

if ($currChannel->isCustomRSSElement('dc:creator')) {

  //obtain reference to the dc:creator element
  $myElement =& $currChannel->getElement('dc:creator');

  //echo to browser
  echo $myElement->toNormalizedString(true);
}

The result, if a dc:creator node from the Dublin Core was present:

<dc:creator>John Heinstein</dc:creator>

With an object reference in hand, you can then use the DOM methods to extract the data for that object. For a dc:crfeator element, you might do this:

if ($currChannel->isCustomRSSElement('dc:creator')) {

  //obtain reference to the dc:creator element
  $myElement =& $currChannel->getElement('dc:creator');

  //echo dc:creator content to browser using the DOMIT getText method
  echo $myElement->getText();
}

The result is:

John Heinstein

3.3. isCollection

The isCollection method take a single parameter -- an element name -- and returns true if the element name is an RSS collection:

echo $currChannel->isCollection('category');

The above example returns true.

If you have determined that an element name represents an RSS collection, then the following methods are available for extracting data from the elements of the collection.

3.3.1. getElement

You can also use the getElement method to return an object reference to a collection:

if ($currChannel->isCollection('category')) {

  //get object reference to collection
  $myCollection =& $currChannel->getElement('category');

  //process collection...
}

3.3.2. getElementCount and getElementAt

Once you have obtained a reference to a collection, you can use the getElementCount method to determine the number of members of that collection.

A for loop can then be used to iterate through the members of the collection. The getElementAt method will allow you to access the collection members by index: :

if ($currChannel->isCollection('category')) {

  //get object reference to collection
  $myCollection =& $currChannel->getElement('category');

  //get number of collection members
  $numMembers = $myCollection->getElementCount();

  //iterate through members of collection
  for ($i = 0; $i < $numMembers; $i++) {

    //get reference to each member
    $currMember =& $myCollection->getElementAt($i); 

    //echo to browser
    echo $currMember->toNormalizedString(true);
  }
}

The result is:

<category domain="http://www.superopendirectory.com/">philosophy/humor</category>
<category domain="http://www.superopendirectory.com/">philosophy/hogwash</category>

4. Navigating an RSS Document by RSS Type

Navigating through an RSS Document by RSS type involves:

  • obtaining a list of available elements

  • iterating through the elements in the list

  • determining the RSS type of each element

  • querying each element for data, based on the RSS type of that element

4.1. Step 1: Get an Element List

First, an element list is obtained using the getElementList method:

//get array of element names under a channel
$elementList = $currChannel->getElementList();

4.2. Step 2: Construct a Loop Over the Element List

Secondly, a loop is constructed over the element list, and the element name is obtained at each iteration:

$elementList =& $currChannel->getElementList();
$numElements = count($elementList);

for ($i = 0; $i < $numElements; $i++) {

  //get current element name
  $currElementName =& $elementList[$i];
}

4.3. Step 3: Test for RSS Type

Thirdly, the element is sorted into one of the four categories of RSS type, using the isSimpleRSSElement, isCustomRSSElement, and isCollection methods:

$elementList =& $currChannel->getElementList();
$numElements = count($elementList);

for ($i = 0; $i < $numElements; $i++) {

  //get current element name
  $currElementName =& $elementList[$i];
  
  if ($currChannel->isSimpleRSSElement($currElementName)) { 
    //element is a simple RSS element
  }
  else if ($currChannel->isCustomRSSElement($currElementName)) { 
    //element is a custom RSS element
  }
  else if ($currChannel->isCollection($currElementName)) { 
    //element is a collection of RSS elements
  }
  else { 
    //element is a complex RSS element
  }
}

4.4. Step 4: Query Elements with Methods Specific to Their Type

The last step is to process elements according to the methods of their RSS type.

$elementList =& $currChannel->getElementList();
$numElements = count($elementList);

for ($i = 0; $i < $numElements; $i++) {

  //get current element name
  $currElementName =& $elementList[$i];
  
  if ($currChannel->isSimpleRSSElement($currElementName)) { 
    //element is a simple RSS element
    //use getElementText to get value
    $myValue = $currChannel->getElementText($currElementName);
  }
  else if ($currChannel->isCustomRSSElement($currElementName)) { 
    //element is a custom RSS element
    //treat as a DOM node
    $currElement =& $currChannel->getElement($currElementName);
    
    switch($currElementName) {
      case 'dc:creator':
        $myValue = $currElement->getText();
        break;
 
      case 'cost':
        $myValue1 = $currElement->firstChild->nodeValue;
        $myValue2 = $currElement->getAttribute('currency'); 
        break;
    }
  }
  else if ($currChannel->isCollection($currElementName)) { 
    //element is a collection of RSS elements
    $myCollection =& $currChannel->getElement('category');

    //get number of collection members
    $numMembers = $myCollection->getElementCount();
    
    //iterate through members of collection
    for ($i = 0; $i < $numMembers; $i++) {

      //get reference to each member
      $currMember =& $myCollection->getElementAt($i); 

      //process member of collection
    }
  }
  else { 
    //element is a complex RSS element
    //get reference to element
    $element =& $currChannel->getElement($currElement);
    
    switch (strtolower($currElement)) {
      case DOMIT_RSS_ELEMENT_IMAGE:
        //process image element
        break;
      case DOMIT_RSS_ELEMENT_CLOUD:
        //process cloud element
        break;
      case DOMIT_RSS_ELEMENT_TEXTINPUT:
        //process textinput element
        break;
      case DOMIT_RSS_ELEMENT_ENCLOSURE:
        //process enclosure element
        break;
      case DOMIT_RSS_ELEMENT_SOURCE:
        //process source element
        break;
      case DOMIT_RSS_ELEMENT_GUID:
        //process guid element
        break;
      case DOMIT_RSS_ELEMENT_SKIPHOURS:
        //process skipHours element
        break;
      case DOMIT_RSS_ELEMENT_SKIPDAYS:
        //process skipDays element
        break;
  }
}

5. isRSSDefined

The isRSSDefined method is a quick way to test if a child element name is defined by the RSS specification.

It returns true if the element is defined in the RSS specification. For example:

echo ($currChannel->isRSSDefined('image') ? "Is defined." : "Is not defined.");

The result is:

Is defined.

Chapter 6. Using DOM Methods with DOMIT! RSS

The nodes of a DOMIT! RSS document are all available to the underlying DOM parser, DOMIT!.

To access an RSS element as a DOM node at any time, use the node property of that element.

For example, to access an entire DOMIT! RSS document as a DOM document node:

//instantiate RSS document
$rssdoc =& new xml_domit_rss_document('http://www.somesite.com/rss.xml');

//access underlying XML document
$xmldoc =& $rssdoc->node;

Chapter 7. DOMIT! RSS Roadmap

Some of the plans for DOMIT include:

  • proper handling of namespaces

  • Conditional-Get support

  • modules for non-RSS specifications such as Atom, Dublin Core

  • support for SSL

  • gzip encoding support

Chapter 8. Contributing to DOMIT! RSS

DOMIT! RSS has only been made possible through the suggestions, bug reports, and code submissions of others.

If you would like to contribute to DOMIT! RSS or join the DOMIT! RSS team, please email