Unable to read XML file from https: // site
Running R 3.2.0, R Studio 0.99.441, Windows 7 32-bit, XML Package 3.98-1.2
I am trying to read an XML file from the site below using the XML package and xmlTreeParse, but I still get an error.
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml
> fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
> doc <- xmlTreeParse(fileURL, useInternal = TRUE)
Error: XML content does not seem to be XML: 'https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml'
I also tried download.file () with xmlTreeParse
download.file(fileURL, destfile = "data.xml")
doc <- xmlTreeParse("data.xml", useInternalNodes = TRUE)
When I do this, there is no immediate error, but the varcale "doc" has no structure and I am not sure how to read it from this point.
+3
Matt boudas
source
to share
1 answer
Remove s
from https
:
fileURL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
fileURL <- sub('https', 'http', fileURL)
doc <- htmlParse(fileURL)
+1
agstudy
source
to share