Create multi-line record on one line if records are not delimited

I need to process records that span multiple lines. For example, I need to convert a multi-line entry to one string and then get everything I need. The entries are not delimited, so I can't just set RS

to \n\n

.

cat input
constant_string bla bla1
bla bla bal
fooo foooooo baaar          #End of record 1
constant_string bla1 bla2
abcd cdfe fghi jkhil
foo bar bar bar bar bar bar #End of record 2
constant_string bla bla3
random data is present      #End of record 3

      

To achieve this, I converted this non-demarcated data to demarcated data by adding a new line between the two records, for example:

awk '{gsub(/^constant_string/,"\n&")}1' input

constant_string bla bla1
bla bla bal
fooo foooooo baaar

constant_string bla1 bla2
abcd cdfe fghi jkhil
foo bar bar bar bar bar bar

constant_string bla bla3
random data is present 

      

Once I get the demarcated entries, I can install RS

in \n\n

and do whatever I want.

awk '{gsub(/^constant_string/,"\n&")}1' input |awk -v RS= '{$1=$1}1'
constant_string bla bla1 bla bla bal fooo foooooo baaar
constant_string bla1 bla2 abcd cdfe fghi jkhil foo bar bar bar bar bar bar
constant_string bla bla3 random data is present

      

Question:

I can achieve a solution using two steps, is it possible to do it one step in awk?

I tried following but didn't work:

awk  -v RS="" '{gsub(/^constant_string/,"\n&")}1'  input
awk  -v RS="" '{$0=gensub(/^constant_string/,"\n&",$0)}1'  input

      

+3


source to share


3 answers


How about buffering and processing b

on the following constant_string

and END

? Using function

:



$ awk '
function process(str) { if(str!="") print str }
   /^constant_string/ { process(b); b=$0; next }
                      { b=b OFS $0 }
                  END { process(b) }
' file
constant_string bla bla1 bla bla bal fooo foooooo baaar
constant_string bla1 bla2 abcd cdfe fghi jkhil foo bar bar bar bar bar bar
constant_string bla bla3 random data is present

      

+2


source


awk 'BEGIN{ RS="(^|\n)constant_string"}

   # filtering to avoid "empty" record
   /./ { 
      # $1 is first "word" (FS is default) AFTER your constant string that is
      # "removed" of $0 as Record separator.
      #  Info, this is now a multiline record

      #... treat what you want
      print " -- " NR : [" $0 "]"
      for (i=1;i<=NF;i++) print NR "." i " : " $i

      }
   ' YourFile

      

Note:



  • depends on awk version, posix seems to take the RS string as any char inside the string as delimiter, where gawk takes the string itself (regex in this case)
  • check your string_const to avoid the special chhar that is the regex metacharacter.
+1


source


Try this if you have GNU awk -

awk 'NR>1{gsub(/\n/," "); print RS$0}' RS='constant_string' f
constant_string bla bla1 bla bla bal fooo foooooo baaar
constant_string bla1 bla2 abcd cdfe fghi jkhil foo bar bar bar bar bar bar
constant_string bla bla3 random data is present

      

0


source







All Articles