Regex Split at the beginning of a line containing a word

I am trying to split text into paragraphs every time a line contains a specific word. I have already managed to break the text at the beginning of this word, but not at the beginning of the line containing this word. what is the correct expression?

this is what i have

 string[] paragraphs = Regex.Split(text, @"(?=INT.|EXT.)");

      

I also want to lose any empty paragraphs in the array.

this is the entrance

INT. LOCATION - DAY 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. 

LOCATION - EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta. 

LOCATION INT. - NIGHT

      

and I want to split it up keeping the same layout but only in paragraphs.

Result:

INT. LOCATION - DAY 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. 

LOCATION - 

EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta. 

LOCATION 

INT. - NIGHT

      

New paragraphs start with a word, not on a line.

This is the desired result

Paragraph 1

INT. LOCATION - DAY 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. 

      

Point 2

LOCATION - EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta. 

      

Point 3

LOCATION INT. - NIGHT

      

A paragraph must always start at the beginning of a line containing the word INT. or EXT. not a word.

+3


source to share


1 answer


Regex.Split(text, "(?=^.+?INT|^.+?EXT)", RegexOptions.Multiline);

      

check out this text script



string text = "INT. LOCATION - DAY\n" +
                "Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n" +
                "LOCATION - EXT.\n" +
                "Morbi cursus dictum tempor. Phasellus mattis at massa non porta.\n" +
                "LOCATION INT. - NIGHT\n";

            string[] res = Regex.Split(text, "(?=^.+?INT|^.+?EXT)", RegexOptions.Multiline);

            for (int i = 0; i < res.Count(); i++)
            {
                int lineNumber = i + 1;   
                Console.WriteLine("paragraph " + lineNumber + "\n"  + res[i]);
            }


#paragraph 1
#INT. LOCATION - DAY
#Lorem ipsum dolor sit amet, consectetur adipiscing elit.

#paragraph 2
#LOCATION - EXT.
#Morbi cursus dictum tempor. Phasellus mattis at massa non porta.

#paragraph 3
#LOCATION INT. - NIGHT

      

+2


source







All Articles