RSS Feed

Snippy code snippets

Posted on Thursday, May 5, 2011 in Coding

The internet has influenced how software development is done in many ways than we sometimes realize. For any new technology, most people only read an introduction or a tutorial and then Google their way through the rest of it. With the pace of technological advances, it seems like the only way to keep up. This usually means a lot of code snippets are shared online and reused many times by several different people around the globe. This kind of code reuse is good, but we must be careful how we apply the code snippets we find online. Some of them could contain errors which will come back to bite hard. Some of them could be snippy.

I recently came across this XML handler in an online IBM DeveloperWorks article about parsing RSS feeds on Android. This is the code

public class RSSHandler extends DefaultHandler 
{
    
    RSSFeed _feed;
    RSSItem _item;
    String _lastElementName = "";
    boolean bFoundChannel = false;
    final int RSS_TITLE = 1;
    final int RSS_LINK = 2;
    final int RSS_DESCRIPTION = 3;
    final int RSS_CATEGORY = 4;
    final int RSS_PUBDATE = 5;
    
    int depth = 0;
    int currentstate = 0;
    /*
     * Constructor 
     */
    RSSHandler()
    {
    }
    
    /*
     * getFeed - this returns our feed when all of the parsing is complete
     */
    RSSFeed getFeed()
    {
        return _feed;
    }
    
    
    public void startDocument() throws SAXException
    {
        // initialize our RSSFeed object - this will hold our parsed contents
        _feed = new RSSFeed();
        // initialize the RSSItem object - you will use this as a crutch to grab 
		// the info from the channel
        // because the channel and items have very similar entries..
        _item = new RSSItem();

    }
    public void endDocument() throws SAXException
    {
    }
    public void startElement(String namespaceURI, String localName,String qName, 
                                             Attributes atts) throws SAXException
    {
        depth++;
        if (localName.equals("channel"))
        {
            currentstate = 0;
            return;
        }
        if (localName.equals("image"))
        {
            // record our feed data - you temporarily stored it in the item 🙂
            _feed.setTitle(_item.getTitle());
            _feed.setPubDate(_item.getPubDate());
        }
        if (localName.equals("item"))
        {
            // create a new item
            _item = new RSSItem();
            return;
        }
        if (localName.equals("title"))
        {
            currentstate = RSS_TITLE;
            return;
        }
        if (localName.equals("description"))
        {
            currentstate = RSS_DESCRIPTION;
            return;
        }
        if (localName.equals("link"))
        {
            currentstate = RSS_LINK;
            return;
        }
        if (localName.equals("category"))
        {
            currentstate = RSS_CATEGORY;
            return;
        }
        if (localName.equals("pubDate"))
        {
            currentstate = RSS_PUBDATE;
            return;
        }
        // if you don't explicitly handle the element, make sure you don't wind 
               // up erroneously storing a newline or other bogus data into one of our 
               // existing elements
        currentstate = 0;
    }
    
    public void endElement(String namespaceURI, String localName, String qName) 
                                                               throws SAXException
    {
        depth--;
        if (localName.equals("item"))
        {
            // add our item to the list!
            _feed.addItem(_item);
            return;
        }
    }
     
    public void characters(char ch[], int start, int length)
    {
        String theString = new String(ch,start,length);
        Log.i("RSSReader","characters[" + theString + "]");
        
        switch (currentstate)
        {
            case RSS_TITLE:
                _item.setTitle(theString);
                currentstate = 0;
                break;
            case RSS_LINK:
                _item.setLink(theString);
                currentstate = 0;
                break;
            case RSS_DESCRIPTION:
                _item.setDescription(theString);
                currentstate = 0;
                break;
            case RSS_CATEGORY:
                _item.setCategory(theString);
                currentstate = 0;
                break;
            case RSS_PUBDATE:
                _item.setPubDate(theString);
                currentstate = 0;
                break;
            default:
                return;
        }
        
    }
}

It looked good, so I decided to test it with a few XML documents. For simple XML, the code tends to work fine (most of the time). But the moment you try stuff like parsing a Google News RSS feed, you start getting problems. After parsing, the links to the articles all seem to be broken. For every link like this “http://news.google.com/news?ned=us&hl=en&topic=w”, only this was saved “http://news.google.com/news?ned=us”. This could mean that either the parser couldn’t handle the XML escaping, or the XML was not well formatted. Neither of which was the case. This was strange, but there is also a simple explanation for this.

It turns out the XML handler method characters(char ch[], int start, int length) is not guaranteed to be called only when all the characters in a text node have been read. Neither is it guaranteed to be called for each character found. The Java docs state it like this:

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks;

Using the implementation above, only the first call to the characters(…) method will be taken in to consideration. Therefore if the data was split into several chunks, some (or a lot) of the text in the XML file will get lost during parsing.

A better implementation will be to buffer the characters found and only assign them to the object when the endElement(…) method is called. That way, you can be sure that if you have hit an end tag, then all the characters between the start and end tags has been buffered by you. An example of such an implementation from yet another IBM developerWorks article is shown below.

public class RssHandler extends DefaultHandler{
    private List<Message> messages;
    private Message currentMessage;
    private StringBuilder builder;
    
    public List<Message> getMessages(){
        return this.messages;
    }
    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        super.characters(ch, start, length);
        builder.append(ch, start, length);
    }

    @Override
    public void endElement(String uri, String localName, String name)
            throws SAXException {
        super.endElement(uri, localName, name);
        if (this.currentMessage != null){
            if (localName.equalsIgnoreCase(TITLE)){
                currentMessage.setTitle(builder.toString());
            } else if (localName.equalsIgnoreCase(LINK)){
                currentMessage.setLink(builder.toString());
            } else if (localName.equalsIgnoreCase(DESCRIPTION)){
                currentMessage.setDescription(builder.toString());
            } else if (localName.equalsIgnoreCase(PUB_DATE)){
                currentMessage.setDate(builder.toString());
            } else if (localName.equalsIgnoreCase(ITEM)){
                messages.add(currentMessage);
            }
            builder.setLength(0);    
        }
    }

    @Override
    public void startDocument() throws SAXException {
        super.startDocument();
        messages = new ArrayList<Message>();
        builder = new StringBuilder();
    }

    @Override
    public void startElement(String uri, String localName, String name,
            Attributes attributes) throws SAXException {
        super.startElement(uri, localName, name, attributes);
        if (localName.equalsIgnoreCase(ITEM)){
            this.currentMessage = new Message();
        }
    }
}

The next time you find code snippets on the internet, test them and make sure they apply well to your use-case, before using them in your application.

Share and Enjoy:
  • Twitter
  • Google Bookmarks
  • Digg
  • del.icio.us
  • Facebook

Be the first to comment.

Leave a Reply