Keep track of the number of lines of a string in a vector in R

My problem is best solved with a simple and simple example:

my_strings = c("apple", "banana", "carrot", "apple", "apple", "dairy", "banana", "eggplant", "flowers", "flowers", "apple", "banana")

my_repeats = c(0, 0, 0, 1, 2, 0, 1, 0, 0, 1, 3, 2)

      

The my_repeats vector is best understood by traversing the my_strings element from start to finish, one element at a time. Since the apple, banana and carrot have not yet appeared in the row on the first visit, they are all assigned 0. Then the apple appears on the 2nd and 3rd time (repeats for the 1st and 2nd times), so it gets 1 and 2. Then 0, because dairy products have not yet appeared, then 1, because the banana is repeated for the first time, and so on.

Being able to count the repeatability of the rows and store that data in a vector of the same length would help a ton with what I'm working on. But I'm not entirely sure a quick and vectorial way to do this. Any thoughts are appreciated!

EDIT: Basically I need a cumulative count function - now I check if exists for rows.

+3


source to share


3 answers


To do this, you can use a function ave

with seq_along

:

as.numeric(ave(my_strings, my_strings, FUN = seq_along)) - 1
##  [1] 0 0 0 1 2 0 1 0 0 1 3 2

      



There is also a function rowid

from "data.table":

library(data.table)
rowid(my_strings) - 1
##  [1] 0 0 0 1 2 0 1 0 0 1 3 2

      

+5


source


Here's a dplyr

solution for rows in a column of a dataframe:



library(dplyr)
df1 <- data.frame(words = c("apple", "banana", "carrot", "apple", "apple", "dairy", 
                            "banana", "eggplant", "flowers", "flowers", "apple", "banana"), 
                  stringsAsFactors = FALSE)

df1 %>% 
  group_by(words) %>% 
  mutate(count = sequence(n()) - 1)

      

0


source


Not the easiest way, but if you want to dig the insides of this, you can program it yourself as

mat <- apply(sapply(unique(my_strings), function(x) x == my_strings), 2, cumsum) - 1L
diag(mat[, my_strings])
#>  [1] 0 0 0 1 2 0 1 0 0 1 3 2

      

0


source







All Articles