Split the string using a delimiter array in r
I'm new to R. I need to split a sentence based on phrase separators. we can use strsplit to split a string based on a single delimiter. I want to split a string based on the number of delimiters such as [,.:; ]. How can I do this in one step. Is there any regular expression for this?
For example:
my_string = "This is a sentence. This is a question, right? Yes! It is."
expected output:
"This is a sentence", "This is a question", "right", "yes", "It is"
source to share
You can use this:
strsplit("This is a sentence. This is a question, right? Yes! It is.", "\\.|,|\\?|!")
#[[1]]
#[1] "This is a sentence" " This is a question" " right"
#[4] " Yes" " It is"
To get rid of the extra spaces, you can do this:
strsplit("This is a sentence. This is a question, right? Yes! It is.",
"\\. *|, |\\? *|! *")
#[[1]]
#[1] "This is a sentence" "This is a question" "right"
#[4] "Yes" "It is"
As stated in this article, it's even easier:
strsplit("This is a sentence. This is a question, right? Yes! It is.",
"[,.:;?!]\\s*") # \\s* represents a space character appearing 0 or more times
You need to avoid certain characters that are otherwise interpreted as metacharacters. That's why you see \\
before .
and ?
. |
is a kind of "or".
source to share
you can use this pattern to get output
string input = @"This is a sentence. This is a question, right? Yes! It is.";
string pattern = @"[, . : ; ]";
foreach (string result in Regex.Split(input, pattern))
{
Console.WriteLine("'{0}'", result);
}
see the console if you are getting the correct result.
source to share