Extract lines between two patterns and include line above first and second

With the following text file, I need to extract and print the lines between the two patterns, and also include the line above the first pattern and the second one on the second

asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf 
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf 
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa

      

I found many solutions with sed and awk to extract the following from two tags

sed -n '/FIRST/,/SECOND/p' FileName

      

but how do I include the line before and after the template?

Desired output:

line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern

      

+3


source to share


7 replies


As you asked for a solution sed

/ awk

(and everyone is scared ed

;-), here's one way to do it in awk:

awk '/FIRST/{print p; f=1} {p=$0} /SECOND/{c=1} f; c--==0{f=0}' file

      

If the first pattern matches, print the previous line p

and set the print flag f

. When the second pattern matches, set c

to 1. If f

equal to 1 (true), the current line will be printed. c--==0

matches only the string after matching the second pattern.

Another way you can do this is by double clicking the file:

awk 'NR==FNR{if(/FIRST/)s=NR;else if(/SECOND/)e=NR;next}FNR>=s-1&&FNR<=e+1' file file

      



The first pass through the file goes through the file and records the line numbers. The second one prints lines in a range.

The advantage of the second approach is that it is trilinearly easy to print M lines before and N lines after the range by simply changing the numbers in the script.

To use shell variables instead of hard-coded patterns, you can pass variables like this:

awk -v first="$first" -v second="$second" '...' file

      

Then use $0 ~ first

instead /FIRST/

.

+3


source


I would say

sed '/FIRST/ { x; G; :a n; /SECOND/! ba; n; q; }; h; d' filename

      

I.e:

/FIRST/ {        # If a line matches FIRST
  x              # swap hold buffer and pattern space,
  G              # append hold buffer to pattern space.
                 # We saved the last line before the match in the hold
                 # buffer, so the pattern space now contains the previous
                 # and the matching line.
  :a             # jump label for looping
  n              # print pattern space, fetch next line.
  /SECOND/! ba   # unless it matches SECOND, go back to :a
  n              # fetch one more line after the match
  q              # quit (printing that last line in the process)
}
h                # If we get here, it before the block. Hold the current
                 # line for later use.
d                # don't print anything.

      



Note that BSD sed (as it ships with Mac OS X and * BSD) is a little picky about the branching commands. If you are running one of these platforms,

sed -e '/FIRST/ { x; G; :a' -e 'n; /SECOND/! ba' -e 'n; q; }; h; d' filename

      

must work.

+1


source


This will work regardless of whether multiple ranges exist in your file:

$ cat tst.awk
/FIRST/ { print prev; gotBeg=1 }
gotBeg {
    print
    if (gotEnd)   gotBeg=gotEnd=0
    if (/SECOND/) gotEnd=1
}
{ prev=$0 }

$ awk -f tst.awk file
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern

      

If you need to print more than 1 line before the FIRST change prev

to the array. If you need to print more than 1 line after SECOND, change gotEnd

to invoice.

+1


source


sed '#n
   H;$!d
   x;s/\n/²/g
   /FIRST.*SECOND/!b
   s/.*²\([^²]*²[^²]*FIRST\)/\1/
:a
   s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/
   ta
   s/²/\
/g
   p' YourFile

      

  • POSIX sed version (using GNU sed --posix

    )
  • take the following SECOND pattern, also if on the same line it is easy to adapt to take at least one new line in between
    • #n

      : do not print if request is expres (for example p

      )
    • H;$!d

      : add each line to the buffer, if not the last line, remove the current line and loop
    • x;s/\n/²/g

      : load a buffer and replace any newline with another character (I'm using here ²

      ) because posix sed doesn't allow[^\n]

    • /FIRST.*SECOND/!b

      : if no template is present, exit without output
    • s/.*²\([^²]*²[^²]*FIRST\)/\1/

      : remove everything before the line before the first pattern
    • :a

      : label for goto (used later)
    • s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/

      : remove everything after the line after the second pattern. It accepts the largest string, so the last occurrence of the template is a link
    • ta

      : if the latter s///

      happens, you get a label a

      . This is cyle until the first SECOND pattern appears in the file (after FIRST)
    • s/²/\ /g

      : return new lines
    • p

      : print the result
0


source


based on Tom's comment: if the file is small, we can just store it in an array and then loop over it:

awk '{a[++i]=$0} /FIRST/{s=NR} /SECOND/{e=NR} END {for(i=s-1;i<e+1;i++) print a[i]}'

      

0


source


I would do it with Perl personally. We have a "range operator" that we can use to detect if we are between two patterns:

if ( m/FIRST/ .. /SECOND/ ) 

      

This is the easy part. What's a little less simple is to "catch" the previous and next lines. So I am setting the value $prev_line

, so when I first got into this test, I know what to print. And I clear this $prev_line

because then it is empty when I type it again, but also because then I can detect a transition at the end of the range.

So something like this:

#!/usr/bin/perl

use strict;
use warnings;

my $prev_line = " ";
while (<DATA>) {
    if ( m/FIRST/ .. /SECOND/ ) {
        print $prev_line;
        $prev_line = '';
        print;
    }
    else {
        if ( not $prev_line ) {
            print;
        }
        $prev_line = $_;
    }
}

__DATA__ 
asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf 
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf 
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa

      

0


source


This might work for you (GNU sed):

sed '/FIRST/!{h;d};H;g;:a;n;/SECOND/{n;q};$!ba' file

      

If the current line is not FIRST

, store it in hold space and delete the current line. If a line FIRST

appends it to the stored line, then prints both and any other lines before SECOND

, when the extra line is printed and the script exits.

0


source







All Articles