Create multi-line record on one line if records are not delimited

Question

Create multi-line record on one line if records are not delimited

I need to process records that span multiple lines. For example, I need to convert a multi-line entry to one string and then get everything I need. The entries are not delimited, so I can't just set RS

to \n\n

.

cat input
constant_string bla bla1
bla bla bal
fooo foooooo baaar          #End of record 1
constant_string bla1 bla2
abcd cdfe fghi jkhil
foo bar bar bar bar bar bar #End of record 2
constant_string bla bla3
random data is present      #End of record 3

To achieve this, I converted this non-demarcated data to demarcated data by adding a new line between the two records, for example:

awk '{gsub(/^constant_string/,"\n&")}1' input

constant_string bla bla1
bla bla bal
fooo foooooo baaar

constant_string bla1 bla2
abcd cdfe fghi jkhil
foo bar bar bar bar bar bar

constant_string bla bla3
random data is present

Once I get the demarcated entries, I can install RS

in \n\n

and do whatever I want.

awk '{gsub(/^constant_string/,"\n&")}1' input |awk -v RS= '{$1=$1}1'
constant_string bla bla1 bla bla bal fooo foooooo baaar
constant_string bla1 bla2 abcd cdfe fghi jkhil foo bar bar bar bar bar bar
constant_string bla bla3 random data is present

Question:

I can achieve a solution using two steps, is it possible to do it one step in awk?

I tried following but didn't work:

awk  -v RS="" '{gsub(/^constant_string/,"\n&")}1'  input
awk  -v RS="" '{$0=gensub(/^constant_string/,"\n&",$0)}1'  input

+3

awk

PS. 28 Mar 17 at 8:40

source to share

3 answers

awk 'BEGIN{ RS="(^|\n)constant_string"}

   # filtering to avoid "empty" record
   /./ { 
      # $1 is first "word" (FS is default) AFTER your constant string that is
      # "removed" of $0 as Record separator.
      #  Info, this is now a multiline record

      #... treat what you want
      print " -- " NR : [" $0 "]"
      for (i=1;i<=NF;i++) print NR "." i " : " $i

      }
   ' YourFile

Note:

depends on awk version, posix seems to take the RS string as any char inside the string as delimiter, where gawk takes the string itself (regex in this case)
check your string_const to avoid the special chhar that is the regex metacharacter.

+1

NeronLeVelu 28 Mar 17 at 8:54

source to share

Try this if you have GNU awk -

awk 'NR>1{gsub(/\n/," "); print RS$0}' RS='constant_string' f
constant_string bla bla1 bla bla bal fooo foooooo baaar
constant_string bla1 bla2 abcd cdfe fghi jkhil foo bar bar bar bar bar bar
constant_string bla bla3 random data is present

0

VIPIN KUMAR 28 Mar 17 at 10:38

source to share

James brown · Accepted Answer · 2017-03-28T09:31:45+0000

How about buffering and processing b

on the following constant_string

and END

? Using function

:

$ awk '
function process(str) { if(str!="") print str }
   /^constant_string/ { process(b); b=$0; next }
                      { b=b OFS $0 }
                  END { process(b) }
' file
constant_string bla bla1 bla bla bal fooo foooooo baaar
constant_string bla1 bla2 abcd cdfe fghi jkhil foo bar bar bar bar bar bar
constant_string bla bla3 random data is present

Create multi-line record on one line if records are not delimited

More articles: