Problem creating data.frame from xml file

I have my first jump to convert XML to data.frame and found questions like this: How to convert XML data to data.frame? very useful, but still fail to convert the XML part to the data.frame file.

My goal is to make a graph of the Euro against the US dollar rates over time. The data is listed here in XML format:

http://www.ecb.europa.eu/stats/exchange/eurofxref/html/usd.xml

I can read the data and show which piece of data (node?) I am interested in:

library(XML)
doc <- xmlTreeParse("http://www.ecb.europa.eu/stats/exchange/eurofxref/html/usd.xml")
root <- xmlRoot(doc)
root[[2]][[2]]

      

I've tried variations of getNodeSet () to show all lines that start with, but sofar to no avail:

getNodeSet(root, "/DataSet/Series/*")
getNodeSet(root, "//obs")
getNodeSet(root, "//obs[@OBS_VALUE = 1.1789]")

      

How can I extract all or TIME_PERIOD and OBS_VALUE variables from this XML file and put them in R data.frame? Thanks for any comments or clarifications.

+3


source to share


1 answer


This data is in sdmx format. You can use the R package rsdmx

for data analysis:

library(rsdmx)
appData <- readSDMX("http://www.ecb.europa.eu/stats/exchange/eurofxref/html/usd.xml")
myData <- as.data.frame(appData)

> head(myData)
FREQ CURRENCY CURRENCY_DENOM EXR_TYPE EXR_SUFFIX TIME_FORMAT COLLECTION TIME_PERIOD OBS_VALUE OBS_STATUS OBS_CONF
1    D      USD            EUR     SP00          A         P1D          A  1999-01-04    1.1789          A        F
2    D      USD            EUR     SP00          A         P1D          A  1999-01-05    1.1790          A        F
3    D      USD            EUR     SP00          A         P1D          A  1999-01-06    1.1743          A        F
4    D      USD            EUR     SP00          A         P1D          A  1999-01-07    1.1632          A        F
5    D      USD            EUR     SP00          A         P1D          A  1999-01-08    1.1659          A        F
6    D      USD            EUR     SP00          A         P1D          A  1999-01-11    1.1569          A        F

      



Alternatively, if you only have an XML package:

doc <- xmlParse("http://www.ecb.europa.eu/stats/exchange/eurofxref/html/usd.xml")
docData <- getNodeSet(doc, "//ns:Obs"
                      , namespaces = c(ns = "http://www.ecb.europa.eu/vocabulary/stats/exr/1")
                      , fun = xmlAttrs)
docData <- do.call(rbind, docData)
> head(docData)
TIME_PERIOD  OBS_VALUE OBS_STATUS OBS_CONF
[1,] "1999-01-04" "1.1789"  "A"        "F"     
[2,] "1999-01-05" "1.1790"  "A"        "F"     
[3,] "1999-01-06" "1.1743"  "A"        "F"     
[4,] "1999-01-07" "1.1632"  "A"        "F"     
[5,] "1999-01-08" "1.1659"  "A"        "F"     
[6,] "1999-01-11" "1.1569"  "A"        "F" 

      

+4


source







All Articles