Extract number and name from string [r]

POSIX Expression is giving me a headache.

Let's say we have a line:

a = "[question(37), question_pipe(\"Person10\")]"

      

and ultimately I would like to be able to:

b = c("37", "Person10")

      

I have looked at the package stringr

but cannot figure out how to extract information from regex and str_split

.

Any help would be greatly appreciated.

Cameron

+3


source to share


4 answers


So, if I understood correctly, you want to extract the items in parentheses.

You can extract these elements first, including the parentheses, using str_extract_all

:

b1 <- str_extract_all(string = a, pattern = "\\(.*?\\)")
b1
# [[1]]
# [1] "(37)"           "(\"Person10\")"

      

Since it str_extract_all

returns a list, include it in the vector:



b2 <- unlist(b1)
b2
# [1] "(37)"           "(\"Person10\")"

      

Finally, you can remove the parenthesis (first and last character of each line) with str_sub

:

b3 <- str_sub(string = b2, start = 2L, end = -2L) 
b3
# [1] "37"           "\"Person10\""

      

Edit: a few comments about the regex pattern: \\(

and \\)

are the opening and closing brackets. .*?

means any string of characters, but no greedy one, otherwise you will get one long match from first (

to last )

.

+3


source


This should work in your specific case:

a <- "[question(37), question_pipe(\"Person10\")]"

# First split into two parts
b <- strsplit(a, ",")[[1]]

# Extract the number (skip as.integer if you want it as character)
x <- as.integer(gsub("[^0-9]","", b[[1]])) # 37

# Extract the stuff in quotes
y <- gsub(".*\"(.*)\".*", "\\1", b[[2]])   # "Person10"

      



Alternative for extracting everything in brackets from the first part:

x <- gsub(".*\\((.*)\\).*", "\\1", b[[1]]) # "37"

      

+3


source


I would do it like this:

a <- "[question(37), question_pipe(\"Person10\")]"
b <- unlist(strsplit(gsub("\"","",gsub(".*question\\((.*)\\).*question_pipe\\((.*)\\).*","\\1,\\2",a)),","))
print(b)
[1] "37"       "Person10"

      

0


source


expanding on the flop answer - this would be the shortest solution I think:

a <- "[question(37), question_pipe(\"Person10\")]"    
b1 <- unlist(str_extract_all(string = a, pattern = "\(.*?\)"))
b <- gsub("[[:punct:]]", "", b1)

      

0


source







All Articles