Filling the matrix with a list, where each vector in the list can be 1 - 7 elements [R]

Tell me I have some ';' the split information in a vector that I want to split using strsplit. data

contains information that looks like this:

[1] "k__Fungi; p__Ascomycota; c__Eurotiomycetes; o__unidentified; f__unidentified; g__unidentified; s__Eurotiomycetes sp"
[2] "k__Fungi; p__Basidiomycota; c__Agaricomycetes; o__Agaricales; f__Mycenaceae; g__unidentified; s__Mycenaceae sp"     
[3] "k__Fungi; p__Ascomycota"                                                                                            
[4] "None"                                                                                                               
[5] "k__Fungi; p__Glomeromycota; c__Glomeromycetes; o__Glomerales; f__Glomeraceae; g__Glomus; s__Glomus macrocarpum"     
[6] "k__Fungi; p__Basidiomycota; c__Agaricomycetes; o__Agaricales; f__Inocybaceae; g__Inocybe"                           

      

I use strsplit

to highlight this information like this:

list<- strsplit(data,split=";")

      

whose output is

[[1]]
[1] "k__Fungi"              " p__Ascomycota"        " c__Eurotiomycetes"    " o__unidentified"      " f__unidentified"      " g__unidentified"      " s__Eurotiomycetes sp"

[[2]]
[1] "k__Fungi"           " p__Basidiomycota"  " c__Agaricomycetes" " o__Agaricales"     " f__Mycenaceae"     " g__unidentified"   " s__Mycenaceae sp" 

[[3]]
[1] "k__Fungi"       " p__Ascomycota"

[[4]]
[1] "None"

[[5]]
[1] "k__Fungi"               " p__Glomeromycota"      " c__Glomeromycetes"     " o__Glomerales"         " f__Glomeraceae"        " g__Glomus"             " s__Glomus macrocarpum"

[[6]]
[1] "k__Fungi"           " p__Basidiomycota"  " c__Agaricomycetes" " o__Agaricales"     " f__Inocybaceae"    " g__Inocybe"      

      

Then I want to pass this information into a matrix formatted as the length of the original data object and 7 named columns. I am generating an empty matrix like this:

out<- matrix(nrow=(length(data)),ncol=7)
colnames(out)<-c("kingdom","phylum","class","order","family","genus","species")

      

An empty matrix looks like this:

     kingdom phylum class order family genus species
[1,]      NA     NA    NA    NA     NA    NA      NA
[2,]      NA     NA    NA    NA     NA    NA      NA
[3,]      NA     NA    NA    NA     NA    NA      NA
[4,]      NA     NA    NA    NA     NA    NA      NA
[5,]      NA     NA    NA    NA     NA    NA      NA
[6,]      NA     NA    NA    NA     NA    NA      NA

      

Then I want to insert information from list

into the matrix, so that if the first vector in the list contains 7 elements, all 7 columns in row 1 will have entries. However, if the vector in the list consists of only two elements, then only the first two columns in this row of the matrix will have entries, and the rest will remain as NA

.

** NOTE. I deliberately avoid loops. I had a loop solution, but it fails when I scale down to a dataset with 100,000 rows.

+3


source to share


1 answer


You may try

library(stringi)
m1 <- stri_list2matrix(list, byrow=TRUE)
colnames(m1) <- c("kingdom","phylum","class","order","family","genus","species")

      

Or, instead of using it, strsplit

we can read it directly withread.table



read.table(text=data, sep=";", fill=TRUE, stringsAsFactors=FALSE, na.strings='')

      

Or using devel version data.table

library(data.table)#v1.9.5+
setDT(list(data))[,tstrsplit(V1, '; ')]

      

+3


source







All Articles