Join lines falling between and not including repeating pattern with sed

I have a file with a number of lines containing timestamps and multiple lines in between. For example,

TIMESTAMP MESSAGE
TRAIL 1
TRAIL 2
TIMESTAMP MESSAGE2
TRAIL 21
TRAIL 22 ...

      

I want to append all the trail messages on one line, or better yet, all the lines between the two timestamps to get on the same line so that my output looks something like

TIMESTAMP MESSAGE TRAIL 1 TRAIL 2
TIMESTAMP MESSAGE2 TRAIL 21 TRAIL 22 ...

      

I have looked at several questions that are similar, but none worked in my case. I tried to use

sed -i '/pattern_for_timestamp/{n;:l N;/pattern_for_timestamp/b ; s/\n// ; bl}'

      

but this only changes every alternate occurrence of the template. Trail messages should not contain any patterns. I would rather use sed

or awk

in this case.

+3


source to share


4 answers


I was going to collect lines in hold space until they are complete and then modify and print, like this:

parse.sed

/^TIMESTAMP/ b prn            # Run the prn subroutine
H                             # Anything else is appended to hold-space
$ b prn                       # Also run prn at end-of-input
b                             # Process next line

:prn
x                             # Swap pattern-space and hold-space
s/\n/ /g                      # Replace \n with space
1!p                           # Print the result if not on the first line

      

Run it like this:

sed -nf parse.sed infile

      



Or as a one-liner:

sed -n '/^TIMESTAMP/bp;H;$bp;b;:p;x;s/\n/ /g;1!p' infile

      

Output:

TIMESTAMP MESSAGE TRAIL 1 TRAIL 2
TIMESTAMP MESSAGE2 TRAIL 21 TRAIL 22 ...

      

+2


source


With gawk, you can use a regex as a record separator and then use the built-in RT

to restore its value in the output:



$ cat file
20170102 MESSAGE
TRAIL 1
TRAIL 2
20170312 MESSAGE2
TRAIL 21
TRAIL 22
20170527 MESSAGE3
TRAIL 31
TRAIL 32

$ gawk -v RS="[0-9]{8}" 'NR>1{gsub("\n", " "); print ts $0} {ts=RT}' file
20170102 MESSAGE TRAIL 1 TRAIL 2 
20170312 MESSAGE2 TRAIL 21 TRAIL 22 
20170527 MESSAGE3 TRAIL 31 TRAIL 32 

      

+3


source


Here's my attempt at awk:

awk '/^TIMESTAMP/{ if (NR > 1){ ORS = ""; print "\n"} ORS = " " };1' file

      

Output:

TIMESTAMP MESSAGE TRAIL 1 TRAIL 2
TIMESTAMP MESSAGE2 TRAIL 21 TRAIL 22

      

+1


source


This might work for you (GNU sed):

sed ':a;N;/\nTIMESTAMP/!s/\n/ /;ta;P;D' file

      

Collect lines in patten space, replacing newlines with spaces, then print the first line when the next is encountered TIMESTAMP

.

NB Expect first line TIMESTAMP

if not used:

sed '/^TIMESTAMP/!b;:a;N;/\nTIMESTAMP/!s/\n/ /;ta;P;D' file

      

+1


source







All Articles