Unable to read XML file from https: // site

Running R 3.2.0, R Studio 0.99.441, Windows 7 32-bit, XML Package 3.98-1.2

I am trying to read an XML file from the site below using the XML package and xmlTreeParse, but I still get an error.

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml

> fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
> doc <- xmlTreeParse(fileURL, useInternal = TRUE)
Error: XML content does not seem to be XML: 'https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml' 

      

I also tried download.file () with xmlTreeParse

download.file(fileURL, destfile = "data.xml")
doc <- xmlTreeParse("data.xml", useInternalNodes = TRUE)

      

When I do this, there is no immediate error, but the varcale "doc" has no structure and I am not sure how to read it from this point.

+3


source to share


1 answer


Remove s

from https

:



fileURL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
fileURL <- sub('https', 'http', fileURL)
doc <- htmlParse(fileURL)

      

+1


source







All Articles