Regex Split at the beginning of a line containing a word
I am trying to split text into paragraphs every time a line contains a specific word. I have already managed to break the text at the beginning of this word, but not at the beginning of the line containing this word. what is the correct expression?
this is what i have
string[] paragraphs = Regex.Split(text, @"(?=INT.|EXT.)");
I also want to lose any empty paragraphs in the array.
this is the entrance
INT. LOCATION - DAY
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
LOCATION - EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta.
LOCATION INT. - NIGHT
and I want to split it up keeping the same layout but only in paragraphs.
Result:
INT. LOCATION - DAY
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
LOCATION -
EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta.
LOCATION
INT. - NIGHT
New paragraphs start with a word, not on a line.
This is the desired result
Paragraph 1
INT. LOCATION - DAY
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Point 2
LOCATION - EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta.
Point 3
LOCATION INT. - NIGHT
A paragraph must always start at the beginning of a line containing the word INT. or EXT. not a word.
+3
source to share
1 answer
Regex.Split(text, "(?=^.+?INT|^.+?EXT)", RegexOptions.Multiline);
check out this text script
string text = "INT. LOCATION - DAY\n" +
"Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n" +
"LOCATION - EXT.\n" +
"Morbi cursus dictum tempor. Phasellus mattis at massa non porta.\n" +
"LOCATION INT. - NIGHT\n";
string[] res = Regex.Split(text, "(?=^.+?INT|^.+?EXT)", RegexOptions.Multiline);
for (int i = 0; i < res.Count(); i++)
{
int lineNumber = i + 1;
Console.WriteLine("paragraph " + lineNumber + "\n" + res[i]);
}
#paragraph 1
#INT. LOCATION - DAY
#Lorem ipsum dolor sit amet, consectetur adipiscing elit.
#paragraph 2
#LOCATION - EXT.
#Morbi cursus dictum tempor. Phasellus mattis at massa non porta.
#paragraph 3
#LOCATION INT. - NIGHT
+2
source to share