Extract number and name from string [r]
POSIX Expression is giving me a headache.
Let's say we have a line:
a = "[question(37), question_pipe(\"Person10\")]"
and ultimately I would like to be able to:
b = c("37", "Person10")
I have looked at the package stringr
but cannot figure out how to extract information from regex and str_split
.
Any help would be greatly appreciated.
Cameron
source to share
So, if I understood correctly, you want to extract the items in parentheses.
You can extract these elements first, including the parentheses, using str_extract_all
:
b1 <- str_extract_all(string = a, pattern = "\\(.*?\\)")
b1
# [[1]]
# [1] "(37)" "(\"Person10\")"
Since it str_extract_all
returns a list, include it in the vector:
b2 <- unlist(b1)
b2
# [1] "(37)" "(\"Person10\")"
Finally, you can remove the parenthesis (first and last character of each line) with str_sub
:
b3 <- str_sub(string = b2, start = 2L, end = -2L)
b3
# [1] "37" "\"Person10\""
Edit: a few comments about the regex pattern: \\(
and \\)
are the opening and closing brackets. .*?
means any string of characters, but no greedy one, otherwise you will get one long match from first (
to last )
.
source to share
This should work in your specific case:
a <- "[question(37), question_pipe(\"Person10\")]"
# First split into two parts
b <- strsplit(a, ",")[[1]]
# Extract the number (skip as.integer if you want it as character)
x <- as.integer(gsub("[^0-9]","", b[[1]])) # 37
# Extract the stuff in quotes
y <- gsub(".*\"(.*)\".*", "\\1", b[[2]]) # "Person10"
Alternative for extracting everything in brackets from the first part:
x <- gsub(".*\\((.*)\\).*", "\\1", b[[1]]) # "37"
source to share