Split the string using a delimiter array in r

I'm new to R. I need to split a sentence based on phrase separators. we can use strsplit to split a string based on a single delimiter. I want to split a string based on the number of delimiters such as [,.:; ]. How can I do this in one step. Is there any regular expression for this?

For example:

my_string = "This is a sentence.  This is a question, right?  Yes!  It is."

      

expected output:

"This is a sentence", "This is a question", "right", "yes", "It is"

      

+3


source to share


2 answers


You can use this:

strsplit("This is a sentence. This is a question, right? Yes! It is.", "\\.|,|\\?|!")
#[[1]]
#[1] "This is a sentence"  " This is a question" " right"             
#[4] " Yes"                " It is"

      

To get rid of the extra spaces, you can do this:

strsplit("This is a sentence. This is a question, right? Yes! It is.",
         "\\. *|, |\\? *|! *")
#[[1]]
#[1] "This is a sentence" "This is a question" "right"             
#[4] "Yes"                "It is"

      



As stated in this article, it's even easier:

strsplit("This is a sentence. This is a question, right? Yes! It is.",
     "[,.:;?!]\\s*")  # \\s* represents a space character appearing 0 or more times

      

You need to avoid certain characters that are otherwise interpreted as metacharacters. That's why you see \\

before .

and ?

. |

is a kind of "or".

+4


source


you can use this pattern to get output

        string input = @"This is a sentence. This is a question, right? Yes! It is.";
        string pattern = @"[, . : ; ]";

        foreach (string result in Regex.Split(input, pattern))
        {
            Console.WriteLine("'{0}'", result);
        }

      



see the console if you are getting the correct result.

+1


source







All Articles