Breaking two lines with Notepadd ++
I am preparing Whatsapp chat logs for rendering statistics and wordclouds. However, my data has double line break artifacts at times that are at war with the formatting of the log, I am wondering how I can automate the fix.
13 Mar 18:51 - nicola: mainly he crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats
well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...
Finding and removing blank lines (easy fix). However, I am still left with lines that break Date & Time formatting:
13 Mar 18:51 - nicola: mainly he crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats
well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...
Target format:
13 Mar 18:51 - nicola: mainly he crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...
Maybe the solution is to use this rule: The line breaks I need to keep follow the pattern:
TEXT *linebreak*
NUMBER(begging of date column)
Sandy follows the pattern:
TEXT *linebreak*
TEXT
How can I fix Notepad ++ working?
In the find and replace dialog, you can search for this pattern
\r\n(?!\d)
with regex enabled and replace nothing.
\r\n
searches for a line break consisting of CR and LF. Turn on control characters in Notepad ++ to see which line break you have.
(?!\d)
is a negative-looking statement that is true when there is no next digit. This works for your example, but may not work well for some corner cases, you can extend it to a template for example. (?!\d{2}\s)
when the date is always a two-digit day.