R wildcard for searching text
I have a data.frame column that has values like below. I want to use each cell and create two columns - num1 and num2, so num1 = everything before "-" and num2 = everything between "-" and "."
I am thinking to use the gregexpr function as shown here and write a for loop to iterate over each line. Is there a faster way to do this?
60-150.PNG
300-12.PNG
employee <- c('60-150.PNG','300-12.PNG')
employ.data <- data.frame(employee)
source to share
Try
library(tidyr)
extract(employ.data, employee, into=c('num1', 'num2'),
'([^-]*)-([^.]*)\\..*', convert=TRUE)
# num1 num2
#1 60 150
#2 300 12
or
library(data.table)#v1.9.5+
setDT(employ.data)[, tstrsplit(employee, '[-.]', type.convert=TRUE)[-3]]
# V1 V2
#1: 60 150
#2: 300 12
Or based on @rawr comment
read.table(text=gsub('-|.PNG', ' ', employ.data$employee),
col.names=c('num1', 'num2'))
# num1 num2
#1 60 150
#2 300 12
Update
To keep the original column
extract(employ.data, employee, into=c('num1', 'num2'), remove=FALSE,
'([^-]*)-([^.]*)\\..*', convert=TRUE)
# employee num1 num2
#1 60-150.PNG 60 150
#2 300-12.PNG 300 12
or
setDT(employ.data)[, paste0('num', 1:2) := tstrsplit(employee,
'[-.]', type.convert=TRUE)[-3]]
# employee num1 num2
#1: 60-150.PNG 60 150
#2: 300-12.PNG 300 12
or
cbind(employ.data, read.table(text=gsub('-|.PNG', ' ',
employ.data$employee),col.names=c('num1', 'num2')))
# employee num1 num2
#1 60-150.PNG 60 150
#2 300-12.PNG 300 12
source to share
You can try cSplit
from my splitstackshape package:
library(splitstackshape)
cSplit(employ.data, "employee", "-|.PNG", fixed = FALSE)
# employee_1 employee_2
# 1: 60 150
# 2: 300 12
Since you mention gregexpr
, you can probably try something like:
do.call(rbind,
regmatches(as.character(employ.data$employee),
gregexpr("-|.PNG", employ.data$employee),
invert = TRUE))[, -3]
[,1] [,2]
[1,] "60" "150"
[2,] "300" "12"
source to share
The function strsplit
will give you what you are looking for, list it.
employee <- c('60-150.PNG','300-12.PNG')
strsplit(employee, "[-]")
##Output:
[[1]]
[1] "60" "150.PNG"
[[2]]
[1] "300" "12.PNG"
Note that the second argument strsplit
is the regexp value, not just the character to split, so a more complex regexp can be used.
source to share