How do I convert XML to data.frame when nodes only have attributes?
I am trying to use an XML package and an xmlToList or xmlToDataFrame function. My input is on the internet (first 2 lines) and I only need to work with a certain piece of XML (see Third Nodes Command)
url<- 'http://ClinicalTrials.gov/show/NCT00191100?resultsxml=true'
xml = xmlTreeParse(url,useInternalNode=TRUE)
ns <- getNodeSet(xml, '/clinical_study/clinical_results/reported_events/serious_events/category_list')
This is a list of categories and within the categories are "events". And events have a score (and counts relate to clinical trials (e.g. drug versus placebo)
I only need events, so the best list here is for stopping respiratory treatment with xmlToList
xl<-xmlToList(url)
set2<-xl$clinical_results$reported_events$serious_events$category_list
set2[[3]]
> set2[[3]]
$title
[1] "Cardiac disorders"
$event_list
$event_list$event
$event_list$event$sub_title
[1] "Cardio-respiratory arrest"
$event_list$event$counts
group_id events subjects_affected subjects_at_risk
"E1" "1" "1" "260"
$event_list$event$counts
group_id events subjects_affected subjects_at_risk
"E2" "0" "0" "255"
I cannot use xmlToDataFrame because of this error. (nodeet2 has all data in XMLattributes and I think the xmlTODataFrame might not like it)
hopefulyDF <- getNodeSet(xml, '/clinical_study/clinical_results/reported_events/serious_events/category_list/category/event_list/event/counts')
xmlToDataFrame(node = hopefulyDF)
Error in matrix(vals, length(nfields), byrow = TRUE) :
'data' must be of a vector type, was 'NULL'
What's the best way to extract the counts data? I've tried a list, but I'm not advanced enough in R, I guess. I would like to avoid the xmlGetAttr loop and guide. But in the worst case, any decision is made. I find the XML package is very dense with 2 versions of XML data as a list and like NodeSets ...: --(
Ideal output would look like this: (all events (not just line 3)
event group_ID numerator denumerator
Cardio-respiratory arrest E1 1 260
Cardio-respiratory arrest E2 0 250
(or even has a category column (heart disorders) - that would be super ideal)
ps I used this question How to convert XML data to data.frame? and this R question to enumerate into a dataframe , but no luck.: --(
source to share
You can make it easy to extract XML, iterate over each one, event
and extract attributes counts
via relative XPath. Using rbindlist
from a package data.table
, you can handle missing attributes without adding any conditional code:
library(XML)
library(data.table)
url <- 'http://ClinicalTrials.gov/show/NCT00191100?resultsxml=true'
xml <- xmlTreeParse(url,useInternalNode=TRUE)
ns <- getNodeSet(xml, '//event')
rbindlist(lapply(ns, function(x) {
event <- xmlValue(x)
data.frame(event, t(xpathSApply(x, ".//counts", xmlAttrs)))
}), fill=TRUE)
## event group_id subjects_affected events subjects_at_risk
## 1: Total, serious adverse events E1 44 NA NA
## 2: Total, serious adverse events E2 17 NA NA
## 3: Anaemia E1 6 6 260
## 4: Anaemia E2 0 0 255
## 5: Febrile neutropenia E1 6 6 260
## ---
## 174: Cough E2 15 16 255
## 175: Pruritus E1 14 16 260
## 176: Pruritus E2 9 9 255
## 177: Hypertension E1 19 19 260
## 178: Hypertension E2 21 21 255
You can always convert it back to data.frame
and / or rename the columns if needed.
source to share