How to use numeric list as variable input in gsub template?
I would like to keep only the first half of each line. The imported data duplicates names, all in a larger data frame:
fname: TimmyTimmy, PopPop, AdnanAdnan, KobeKobe.
The first idea was to count the / 2 characters and then replace that number of characters with gsub, counting the number of characters I would like to remove from the beginning of each line using fn_len as my variable in the template.
fn_len: 5, 6, 5, 4
df$fname <-
gsub("^[[:alpha:]]{df$fn_len}", "", df$fname)
Returns error: invalid regular expression; Reason "Invalid content {} '
The code works if I use single numbers (like 1,2,3,4,5), but obviously don't understand some of the template rules here.
On the other hand, maybe the best way to do it from the start?
source to share
If the pattern is similar to the one shown in the example
gsub("([A-Za-z]+)\\1+", "\\1", str1)
#[1] "Timmy" "Pop" "Adnan" "Kobe"
or
scan(text=sub('(?<=[a-z])(?=[A-Z])', ' ', str1, perl=TRUE),
what='', quiet=TRUE)[c(TRUE, FALSE)]
#[1] "Timmy" "Pop" "Adnan" "Kobe"
or
sapply(strsplit(str1, '(?<=[a-z])(?=[A-Z])', perl=TRUE), `[`,1)
#[1] "Timmy" "Pop" "Adnan" "Kobe"
Update
Should work for lines with names starting with lowercase
gsub('([A-Za-z]+)\\1+', '\\1', str2)
#[1] "Timmy" "Pop" "Adnan" "Kobe" "tim"
data
str1 <- c("TimmyTimmy", "PopPop", "AdnanAdnan", "KobeKobe")
str2 <- c(str1, 'timtim')
source to share