Join lines falling between and not including repeating pattern with sed

Question

Join lines falling between and not including repeating pattern with sed

I have a file with a number of lines containing timestamps and multiple lines in between. For example,

TIMESTAMP MESSAGE
TRAIL 1
TRAIL 2
TIMESTAMP MESSAGE2
TRAIL 21
TRAIL 22 ...

I want to append all the trail messages on one line, or better yet, all the lines between the two timestamps to get on the same line so that my output looks something like

TIMESTAMP MESSAGE TRAIL 1 TRAIL 2
TIMESTAMP MESSAGE2 TRAIL 21 TRAIL 22 ...

I have looked at several questions that are similar, but none worked in my case. I tried to use

sed -i '/pattern_for_timestamp/{n;:l N;/pattern_for_timestamp/b ; s/\n// ; bl}'

but this only changes every alternate occurrence of the template. Trail messages should not contain any patterns. I would rather use sed

or awk

in this case.

+3

regex awk sed

T-32 May 10 '17 at 11:19

source to share

4 answers

With gawk, you can use a regex as a record separator and then use the built-in RT

to restore its value in the output:

$ cat file
20170102 MESSAGE
TRAIL 1
TRAIL 2
20170312 MESSAGE2
TRAIL 21
TRAIL 22
20170527 MESSAGE3
TRAIL 31
TRAIL 32

$ gawk -v RS="[0-9]{8}" 'NR>1{gsub("\n", " "); print ts $0} {ts=RT}' file
20170102 MESSAGE TRAIL 1 TRAIL 2 
20170312 MESSAGE2 TRAIL 21 TRAIL 22 
20170527 MESSAGE3 TRAIL 31 TRAIL 32

+3

jas May 10 '17 at 11:48

source to share

Here's my attempt at awk:

awk '/^TIMESTAMP/{ if (NR > 1){ ORS = ""; print "\n"} ORS = " " };1' file

Output:

TIMESTAMP MESSAGE TRAIL 1 TRAIL 2
TIMESTAMP MESSAGE2 TRAIL 21 TRAIL 22

+1

JFS31 May 10 '17 at 11:36

source to share

This might work for you (GNU sed):

sed ':a;N;/\nTIMESTAMP/!s/\n/ /;ta;P;D' file

Collect lines in patten space, replacing newlines with spaces, then print the first line when the next is encountered TIMESTAMP

.

NB Expect first line TIMESTAMP

if not used:

sed '/^TIMESTAMP/!b;:a;N;/\nTIMESTAMP/!s/\n/ /;ta;P;D' file

+1

potong May 11 '17 at 12:19

source to share

Thor · Accepted Answer · 2017-05-10T11:52:04+0000

I was going to collect lines in hold space until they are complete and then modify and print, like this:

parse.sed

/^TIMESTAMP/ b prn            # Run the prn subroutine
H                             # Anything else is appended to hold-space
$ b prn                       # Also run prn at end-of-input
b                             # Process next line

:prn
x                             # Swap pattern-space and hold-space
s/\n/ /g                      # Replace \n with space
1!p                           # Print the result if not on the first line

Run it like this:

sed -nf parse.sed infile

Or as a one-liner:

sed -n '/^TIMESTAMP/bp;H;$bp;b;:p;x;s/\n/ /g;1!p' infile

Output:

TIMESTAMP MESSAGE TRAIL 1 TRAIL 2
TIMESTAMP MESSAGE2 TRAIL 21 TRAIL 22 ...

Join lines falling between and not including repeating pattern with sed

More articles: