R: extract a list of matching parts of a string through a regular expression
Let's say that I need to extract different parts from a string as a list, for example I would like to split the string "aaa12xxx"
into three parts.
One possibility is to make three calls gsub
:
parts = c()
parts[1] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\1', "aaa12xxx")
parts[2] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\2', "aaa12xxx")
parts[3] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\3', "aaa12xxx")
Of course, this seems like a pretty waste of time (even if it's inside a loop for
). Isn't there a function that just returns a list of parts from a regex and a test string?
source to share
Just split the input string by strsplit
and get the details you want.
> x <- "aaa12xxx"
> strsplit(x,"(?<=[[:alpha:]])(?=\\d)|(?<=\\d)(?=[[:alpha:]])", perl=TRUE)
[[1]]
[1] "aaa" "12" "xxx"
Get details by quoting the zip code.
> m <- unlist(strsplit(x,"(?<=[[:alpha:]])(?=\\d)|(?<=\\d)(?=[[:alpha:]])", perl=TRUE))
> m[1]
[1] "aaa"
> m[2]
[1] "12"
> m[3]
[1] "xxx"
-
(?<=[[:alpha:]])(?=\\d)
Matches all boundaries preceded by an alphabet and then a number. -
|
OR -
(?<=\\d)(?=[[:alpha:]])
Matches all boundaries preceded by a number and then the alphabet. -
Splitting your input according to the agreed boundaries will give you the output you want.
source to share