R: extract a list of matching parts of a string through a regular expression

Question

R: extract a list of matching parts of a string through a regular expression

Let's say that I need to extract different parts from a string as a list, for example I would like to split the string "aaa12xxx"

into three parts.

One possibility is to make three calls gsub

:

parts = c()
parts[1] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\1', "aaa12xxx")
parts[2] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\2', "aaa12xxx")
parts[3] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\3', "aaa12xxx")

Of course, this seems like a pretty waste of time (even if it's inside a loop for

). Isn't there a function that just returns a list of parts from a regex and a test string?

+3

string substring string-matching regex r

fstab 13 jan. 15 at 12:25

source to share

2 answers

Avinash Raj · Answer 1 · 2015-01-13T12:29:13+0000

Just split the input string by strsplit

and get the details you want.

> x <- "aaa12xxx"
> strsplit(x,"(?<=[[:alpha:]])(?=\\d)|(?<=\\d)(?=[[:alpha:]])", perl=TRUE)
[[1]]
[1] "aaa" "12"  "xxx"

Get details by quoting the zip code.

> m <- unlist(strsplit(x,"(?<=[[:alpha:]])(?=\\d)|(?<=\\d)(?=[[:alpha:]])", perl=TRUE))
> m[1]
[1] "aaa"
> m[2]
[1] "12"
> m[3]
[1] "xxx"

(?<=[[:alpha:]])(?=\\d)

Matches all boundaries preceded by an alphabet and then a number.
|

OR
(?<=\\d)(?=[[:alpha:]])

Matches all boundaries preceded by a number and then the alphabet.
Splitting your input according to the agreed boundaries will give you the output you want.

vks · Answer 2 · 2015-01-13T12:41:34+0000

(\\d+)|([a-zA-Z]+)

or

([[:alpha:]]+)|([0-9]+)

You can just grab capture.use str_match_all()

from library(stringr)

. See demo.

https://regex101.com/r/fA6wE2/8

R: extract a list of matching parts of a string through a regular expression

More articles: