Can't convert JSON to dataframe

I want to convert json file to dataframe in R. With the following code:

link <- 'https://www.dropbox.com/s/ckfn1fpkcix1ccu/bevingenbag.json'
document <- fromJSON(file = link, method = 'C')
bev <- do.call("cbind", document)

      

I get this:

    type                features
1   FeatureCollection   list(type = "Feature", geometry = list(type = "Point", coordinates = c(6.54800000288927, 52.9920000044505)), properties = list(gid = "1496600", yymmdd = "19861226", lat = "52.992", lon = "6.548", mag = "2.8", depth = "1.0", knmilocatie = "Assen", baglocatie = "Assen", tijd = "74751"))

      

which is the first row of the matrix. All other lines have the same structure. I am interested in the part properties = list(gid = "1496600", yymmdd = "19861226", lat = "52.992", lon = "6.548", mag = "2.8", depth = "1.0", knmilocatie = "Assen", baglocatie = "Assen", tijd = "74751")

that needs to be converted to a dataframe with columns gid, yymmdd, lat, lon, mag, depth, knmilocatie, baglocatie, tijd

.

I've searched and tried several solutions but none of them worked. I used the rjson package for this. I also tried the RJSONIO and jsonlite package, but couldn't extract the information I needed.

Does anyone know how to fix this problem?

+1


source to share


2 answers


You can get the dataframe here:

library(rjson)
document <- fromJSON(file = "bevingenbag.json", method = 'C')

dat <- do.call(rbind, lapply(document$features, 
                             function(x) data.frame(x$properties)))

      

Edit: how to replace empty values ​​with NA

:



dat$baglocatie[dat$baglocatie == ""] <- NA

      

Result:

head(dat)

      gid   yymmdd    lat   lon mag depth knmilocatie baglocatie   tijd
1 1496600 19861226 52.992 6.548 2.8   1.0       Assen      Assen  74751
2 1496601 19871214 52.928 6.552 2.5   1.5   Hooghalen  Hooghalen 204951
3 1496602 19891201 52.529 4.971 2.7   1.2   Purmerend    Kwadijk 200914
4 1496603 19910215 52.771 6.914 2.2   3.0       Emmen      Emmen  21116
5 1496604 19910425 52.952 6.575 2.6   3.0   Geelbroek    Ekehaar 102631
6 1496605 19910808 52.965 6.573 2.7   3.0     Eleveld      Assen  40114

      

+4


source


This is another very similar approach.

An approach

@SvenHohenstein creates a dataframe at every step, an expensive process. It's much faster for creating vectors and re-entering the entire result at the end. Also, Sven's approach makes each column a factor, which may or may not be what you want. The approach below works about 200 times faster. This can be important if you intend to do this multiple times. Finally, you will need to convert the columns lon, lat, mag, and depth

to numeric.



library(microbenchmark)
library(rjson)

document <- fromJSON(file = "bevingenbag.json", method = 'C')

json2df.1 <- function(json){   # @SvenHohenstein approach
  df <- do.call(rbind, lapply(json$features, 
                       function(x) data.frame(x$properties, stringsAsFactors=F)))
  return(df)
}

json2df.2 <- function(json){
  df <- do.call(rbind,lapply(json[["features"]],function(x){c(x$properties)}))
  df <- data.frame(apply(result,2,as.character), stringsAsFactors=F)
  return(df)
}

microbenchmark(x<-json2df.1(document), y<-json2df.2(document), times=10)
# Unit: milliseconds
#                     expr        min         lq     median         uq        max neval
#  x <- json2df.1(document) 2304.34378 2654.95927 2822.73224 2977.75666 3227.30996    10
#  y <- json2df.2(document)   13.44385   15.27091   16.78201   18.53474   19.70797    10
identical(x,y)
# [1] TRUE

      

+4


source







All Articles