Analyze iTunes RSS in R
I'm trying to parse the iTunes top 100 in R and spit out the artist, song, etc., but I'm having problems with the XML file I guess. I was able to get the payload easily from the Billboard RSS ( http://www1.billboard.com/rss/charts/hot-100 )
GetBillboard <- function() {
hot.100 <- xmlTreeParse("http://www1.billboard.com/rss/charts/hot-100")
hot.100 <- xpathApply(xmlRoot(hot.100), "//item")
top.songs <- character(length(hot.100))
for(i in 1:length(hot.100)) {
top.songs[i] <- xmlSApply(hot.100[[i]], xmlValue)[3]
}
return(top.songs)
}
Trying similar strategies with iTunes though ( https://itunes.apple.com/us/rss/topmusicvideos/limit=100/explicit=true/xml )
GetITunes <- function() {
itunes.raw <- getURL("https://itunes.apple.com/us/rss/topmusicvideos/limit=100/explicit=true/xml")
itunes.xml <- xmlTreeParse(itunes.raw)
top.vids <- xpathApply(xmlRoot(itunes.xml), "//entry")
return(top.vids)
}
I just get nonsense:
> m <- GetITunes()
> m
list()
attr(,"class")
[1] "XMLNodeSet"
>
I am assuming this is an XML file formatting. How can I get this iTunes data to fall into a similar structure as the data from Billboard at this point in the first function?
hot.100 <- xpathApply(xmlRoot(hot.100), "//item")
Thank!
source to share
The problem is that your XML document has a default namespace and you don't take that into account in your xpath. Unfortunately, when there is a default namespace, you need to explicitly list it in the xpath. This should work
xpathApply(xmlRoot(itunes.xml), "//d:entry",
namespaces=c(d="http://www.w3.org/2005/Atom"))
Here, we arbitrarily choose d
to specify the default namespace used in the XML document and then use that prefix in our xpath expression.
source to share