How to use numeric list as variable input in gsub template?

I would like to keep only the first half of each line. The imported data duplicates names, all in a larger data frame:

fname: TimmyTimmy, PopPop, AdnanAdnan, KobeKobe.

The first idea was to count the / 2 characters and then replace that number of characters with gsub, counting the number of characters I would like to remove from the beginning of each line using fn_len as my variable in the template.

fn_len: 5, 6, 5, 4

df$fname <- 
    gsub("^[[:alpha:]]{df$fn_len}", "", df$fname)

      

Returns error: invalid regular expression; Reason "Invalid content {} '

The code works if I use single numbers (like 1,2,3,4,5), but obviously don't understand some of the template rules here.

On the other hand, maybe the best way to do it from the start?

+3


source to share


2 answers


It really looks like the substring operation would be better



fname<-c("TimmyTimmy", "PopPop", "AdnanAdnan", "KobeKobe")
substr(fname, 1, nchar(fname)/2)
# [1] "Timmy" "Pop"   "Adnan" "Kobe" 

      

+4


source


If the pattern is similar to the one shown in the example

 gsub("([A-Za-z]+)\\1+", "\\1", str1)
 #[1] "Timmy" "Pop"   "Adnan" "Kobe" 

      

or

 scan(text=sub('(?<=[a-z])(?=[A-Z])', ' ', str1, perl=TRUE),
                            what='', quiet=TRUE)[c(TRUE, FALSE)]
 #[1] "Timmy" "Pop"   "Adnan" "Kobe" 

      

or

 sapply(strsplit(str1, '(?<=[a-z])(?=[A-Z])', perl=TRUE), `[`,1)
 #[1] "Timmy" "Pop"   "Adnan" "Kobe" 

      



Update

Should work for lines with names starting with lowercase

  gsub('([A-Za-z]+)\\1+', '\\1', str2)
  #[1] "Timmy" "Pop"   "Adnan" "Kobe"  "tim"  

      

data

 str1 <- c("TimmyTimmy", "PopPop", "AdnanAdnan", "KobeKobe")
 str2 <- c(str1, 'timtim')

      

+2


source







All Articles