R: Using a for loop to partially match factor levels to a character string?

I have a dataset and storylines in which I measured the presence of tree species. I am trying to loop through the data and determine what kinds are present in each of the plot-sub-problem combinations.

I was able to create a dataframe that identifies which views are present in each plot-sub-problem combination, but now I am trying to add columns for each view with indicators (values ​​1) that indicate their presence.

The source code / data.frame looks like this:

f = aggregate(Species ~ Subplot + Plot, data = live.trees, 
              FUN=function(x) paste(unique(x), collapse=', '))

a=rep(0, 35)
b=cbind(a,a,a,a,a,a,a,a,a,a,a,a)
colnames(b) = levels(live.trees$Species)
freq = as.data.frame(cbind(f, b))

Species = as.factor(live.trees$Species)

#Only showing 2 of 7 plots here...

freq[1:10,]
   Subplot Plot                    Species AA AM AO BC BG BP EA RA RM SH XG XM
1        1    1                         RA  0  0  0  0  0  0  0  0  0  0  0  0
2        2    1 EA, BP, XM, BC, AA, XG, RA  0  0  0  0  0  0  0  0  0  0  0  0
3        3    1         EA, XG, AA, AM, RA  0  0  0  0  0  0  0  0  0  0  0  0
4        4    1             AA, XM, RA, EA  0  0  0  0  0  0  0  0  0  0  0  0
5        5    1             EA, BC, RA, AA  0  0  0  0  0  0  0  0  0  0  0  0
6        1    2             XM, BC, RA, AM  0  0  0  0  0  0  0  0  0  0  0  0
7        2    2                     RM, RA  0  0  0  0  0  0  0  0  0  0  0  0
8        3    2                 XM, BC, RA  0  0  0  0  0  0  0  0  0  0  0  0
9        4    2                     RA, XM  0  0  0  0  0  0  0  0  0  0  0  0
10       5    2     XM, XG, AA, BC, BG, RA  0  0  0  0  0  0  0  0  0  0  0  0

      

Now I'm trying to write a for loop that walks through the table and inserts a "1" into each of the distinct view columns (AA, AM, AO, etc.) if the two character strings for the views match the freq $ Speces column. The looping loop code I created so far:

#Manually going through and assigning a 1 value for each species 
#using a partial string match with grepl()

    for(k in 1:nrow(freq))
  if(grepl("AA", freq$Species[[k]]) == "TRUE")
    (freq$AA[k] = 1) else
    if(grepl("AM", freq$Species[[k]]) == "TRUE")
      (freq$AM[k] = 1) else
        if(grepl("AO", freq$Species[[k]]) == "TRUE")
          (freq$AO[k] = 1) else
            if(grepl("BC", freq$Species[[k]]) == "TRUE")
              (freq$BC[k] = 1)
                  #.... etc. (cutting off here to save space)

      

The code works to some extent, but overwrites every previous View column and is also pretty clunky.

Subplot Plot                    Species AA AM AO BC BG BP EA RA RM SH XG XM
1        1    1                         RA  0  0  0  0  0  0  0  0  0  0  0  0
2        2    1 EA, BP, XM, BC, AA, XG, RA  1  0  0  0  0  0  0  0  0  0  0  0
3        3    1         EA, XG, AA, AM, RA  1  0  0  0  0  0  0  0  0  0  0  0
4        4    1             AA, XM, RA, EA  1  0  0  0  0  0  0  0  0  0  0  0
5        5    1             EA, BC, RA, AA  1  0  0  0  0  0  0  0  0  0  0  0
6        1    2             XM, BC, RA, AM  0  1  0  0  0  0  0  0  0  0  0  0
7        2    2                     RM, RA  0  0  0  0  0  0  0  0  0  0  0  0
8        3    2                 XM, BC, RA  0  0  0  1  0  0  0  0  0  0  0  0
9        4    2                     RA, XM  0  0  0  0  0  0  0  0  0  0  0  0
10       5    2     XM, XG, AA, BC, BG, RA  1  0  0  0  0  0  0  0  0  0  0  0

      

Like me:

1) Get a for loop to stop rewriting the view presence indicators in the previous columns?

2) Write a for loop in a more elegant way? I thought I could create a factor variable called Views and iterate over the items from that (in the first loop) ... however my newbie newbie started showing.

Any help or suggestions would be much appreciated!

I know this is not a reproducible example, but I'm looking for general suggestions or tips that might help me point in the right direction. I will try to find a default dataset in R so that I can force my problems to replicate on average.

Thank you in advance!

Note. The Species column was created as a string and therefore has a class.

+3


source to share


1 answer


Try

library(qdapTools)
res <- cbind(freq[1:3], mtabulate(strsplit(freq$Species, ', ')))
rowsum(res[,4:ncol(res)], group= res$Plot)
#  AA AM BC BG BP EA RA RM XG XM
#1  4  1  2  0  1  4  5  0  2  2
#2  1  1  3  1  0  0  5  1  1  4

      

or



aggregate(.~Plot, res[c(2,4:ncol(res))], FUN=sum)
#   Plot AA AM BC BG BP EA RA RM XG XM
#1    1  4  1  2  0  1  4  5  0  2  2
#2    2  1  1  3  1  0  0  5  1  1  4

      

or



library(dplyr)
res %>%
   group_by(Plot) %>%
   summarise_each(funs(sum), 4:ncol(res))

      

or



library(data.table)
setDT(res)[, lapply(.SD, sum), by =Plot, .SDcols=4:ncol(res)]

      

data

freq <- structure(list(Subplot = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 
5L), Plot = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), Species = c("RA", 
"EA, BP, XM, BC, AA, XG, RA", "EA, XG, AA, AM, RA", "AA, XM, RA, EA", 
"EA, BC, RA, AA", "XM, BC, RA, AM", "RM, RA", "XM, BC, RA", "RA, XM", 
"XM, XG, AA, BC, BG, RA"), AA = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L), AM = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
    AO = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), BC = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), BG = c(0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L), BP = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L), EA = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
    ), RA = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), RM = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), SH = c(0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L), XG = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L), XM = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
    )), .Names = c("Subplot", "Plot", "Species", "AA", "AM", 
"AO", "BC", "BG", "BP", "EA", "RA", "RM", "SH", "XG", "XM"), 
 class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

      

+3


source







All Articles