Technology, Finance, and Life

Posts Tagged ‘opml’

Parse Google Reader Files

Posted by DK on April 8, 2009

I wanted to send a friend a subset of the blogs I subscribe to, but didn’t want to cut and paste a zillion URLs into an email. Instead, I exported all my subscriptions to an opml file (just go to “Settings” in Google Reader) and used OPML, a lightweight python module, to parse the file.

OPML is basically a subset of XML, focusing on subscription “outlines.” The module allows you to slice the file structure as you would a list or list of lists. Let’s go to the interpreter:

>>> import opml
>>> outline = opml.parse(‘google-reader-subscriptions.xml’)
>>> for element in range(len(outline)):
…     print outline[element].title

General Interest

As you can see, it doesn’t take very much to pull out the top level elements of the file. The code above finds the number of top level elements and loops through them, printing out the element title. FYI, it helps to take a look at the raw opml file, just so you are familiar with the element attributes (e.g. “title”).

You can’t tell, however, just from looking at the interpreter output which elements include nested elements (though you probably will if you are parsing your own subscription file). I know NotesToSelf is actually a feed, and I can inspect the attributes of this feed using normal python slicing syntax:

>>> outline[1].xmlUrl

>>> outline[1].title

You can examine nested elements similarly. The following command references the title of the first feed (“[0]”) in the fourth element (“[3]”, the “Finance” folder):

>>> outline[3][0].title
‘A Credit Trader’

There are probably better ways to share subscriptions, but I look at it as a gentle introduction to XML parsing.

Posted in Python, Tech | Tagged: , | Leave a Comment »