How do I convert XML to data.frame when nodes only have attributes?

I am trying to use an XML package and an xmlToList or xmlToDataFrame function. My input is on the internet (first 2 lines) and I only need to work with a certain piece of XML (see Third Nodes Command)

url<- 'http://ClinicalTrials.gov/show/NCT00191100?resultsxml=true'
xml = xmlTreeParse(url,useInternalNode=TRUE)
ns <- getNodeSet(xml, '/clinical_study/clinical_results/reported_events/serious_events/category_list')

      

This is a list of categories and within the categories are "events". And events have a score (and counts relate to clinical trials (e.g. drug versus placebo)

I only need events, so the best list here is for stopping respiratory treatment with xmlToList

xl<-xmlToList(url)
set2<-xl$clinical_results$reported_events$serious_events$category_list
set2[[3]]

> set2[[3]]
$title
[1] "Cardiac disorders"

$event_list
$event_list$event
$event_list$event$sub_title
[1] "Cardio-respiratory arrest"

$event_list$event$counts
         group_id            events subjects_affected  subjects_at_risk 
             "E1"               "1"               "1"             "260" 

$event_list$event$counts
         group_id            events subjects_affected  subjects_at_risk 
             "E2"               "0"               "0"             "255" 

      

I cannot use xmlToDataFrame because of this error. (nodeet2 has all data in XMLattributes and I think the xmlTODataFrame might not like it)

hopefulyDF <- getNodeSet(xml, '/clinical_study/clinical_results/reported_events/serious_events/category_list/category/event_list/event/counts')
 xmlToDataFrame(node = hopefulyDF)
Error in matrix(vals, length(nfields), byrow = TRUE) : 
  'data' must be of a vector type, was 'NULL'

      

What's the best way to extract the counts data? I've tried a list, but I'm not advanced enough in R, I guess. I would like to avoid the xmlGetAttr loop and guide. But in the worst case, any decision is made. I find the XML package is very dense with 2 versions of XML data as a list and like NodeSets ...: --(

Ideal output would look like this: (all events (not just line 3)

event group_ID numerator denumerator
Cardio-respiratory arrest   E1    1   260
Cardio-respiratory arrest   E2    0   250

      

(or even has a category column (heart disorders) - that would be super ideal)

ps I used this question How to convert XML data to data.frame? and this R question to enumerate into a dataframe , but no luck.: --(

+3


source to share


1 answer


You can make it easy to extract XML, iterate over each one, event

and extract attributes counts

via relative XPath. Using rbindlist

from a package data.table

, you can handle missing attributes without adding any conditional code:

library(XML)
library(data.table)

url <- 'http://ClinicalTrials.gov/show/NCT00191100?resultsxml=true'
xml <- xmlTreeParse(url,useInternalNode=TRUE)

ns <- getNodeSet(xml, '//event')

rbindlist(lapply(ns, function(x) {
  event <- xmlValue(x)
  data.frame(event, t(xpathSApply(x, ".//counts", xmlAttrs)))
}), fill=TRUE)

##                              event group_id subjects_affected events subjects_at_risk
##   1: Total, serious adverse events       E1                44     NA               NA
##   2: Total, serious adverse events       E2                17     NA               NA
##   3:                       Anaemia       E1                 6      6              260
##   4:                       Anaemia       E2                 0      0              255
##   5:           Febrile neutropenia       E1                 6      6              260
##  ---                                                                                 
## 174:                         Cough       E2                15     16              255
## 175:                      Pruritus       E1                14     16              260
## 176:                      Pruritus       E2                 9      9              255
## 177:                  Hypertension       E1                19     19              260
## 178:                  Hypertension       E2                21     21              255

      



You can always convert it back to data.frame

and / or rename the columns if needed.

+4


source







All Articles