Extract lines between two patterns and include line above first and second
With the following text file, I need to extract and print the lines between the two patterns, and also include the line above the first pattern and the second one on the second
asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
I found many solutions with sed and awk to extract the following from two tags
sed -n '/FIRST/,/SECOND/p' FileName
but how do I include the line before and after the template?
Desired output:
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
source to share
As you asked for a solution sed
/ awk
(and everyone is scared ed
;-), here's one way to do it in awk:
awk '/FIRST/{print p; f=1} {p=$0} /SECOND/{c=1} f; c--==0{f=0}' file
If the first pattern matches, print the previous line p
and set the print flag f
. When the second pattern matches, set c
to 1. If f
equal to 1 (true), the current line will be printed. c--==0
matches only the string after matching the second pattern.
Another way you can do this is by double clicking the file:
awk 'NR==FNR{if(/FIRST/)s=NR;else if(/SECOND/)e=NR;next}FNR>=s-1&&FNR<=e+1' file file
The first pass through the file goes through the file and records the line numbers. The second one prints lines in a range.
The advantage of the second approach is that it is trilinearly easy to print M lines before and N lines after the range by simply changing the numbers in the script.
To use shell variables instead of hard-coded patterns, you can pass variables like this:
awk -v first="$first" -v second="$second" '...' file
Then use $0 ~ first
instead /FIRST/
.
source to share
I would say
sed '/FIRST/ { x; G; :a n; /SECOND/! ba; n; q; }; h; d' filename
I.e:
/FIRST/ { # If a line matches FIRST
x # swap hold buffer and pattern space,
G # append hold buffer to pattern space.
# We saved the last line before the match in the hold
# buffer, so the pattern space now contains the previous
# and the matching line.
:a # jump label for looping
n # print pattern space, fetch next line.
/SECOND/! ba # unless it matches SECOND, go back to :a
n # fetch one more line after the match
q # quit (printing that last line in the process)
}
h # If we get here, it before the block. Hold the current
# line for later use.
d # don't print anything.
Note that BSD sed (as it ships with Mac OS X and * BSD) is a little picky about the branching commands. If you are running one of these platforms,
sed -e '/FIRST/ { x; G; :a' -e 'n; /SECOND/! ba' -e 'n; q; }; h; d' filename
must work.
source to share
This will work regardless of whether multiple ranges exist in your file:
$ cat tst.awk
/FIRST/ { print prev; gotBeg=1 }
gotBeg {
print
if (gotEnd) gotBeg=gotEnd=0
if (/SECOND/) gotEnd=1
}
{ prev=$0 }
$ awk -f tst.awk file
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
If you need to print more than 1 line before the FIRST change prev
to the array. If you need to print more than 1 line after SECOND, change gotEnd
to invoice.
source to share
sed '#n
H;$!d
x;s/\n/²/g
/FIRST.*SECOND/!b
s/.*²\([^²]*²[^²]*FIRST\)/\1/
:a
s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/
ta
s/²/\
/g
p' YourFile
- POSIX sed version (using GNU sed
--posix
) - take the following SECOND pattern, also if on the same line it is easy to adapt to take at least one new line in between
-
#n
: do not print if request is expres (for examplep
) -
H;$!d
: add each line to the buffer, if not the last line, remove the current line and loop -
x;s/\n/²/g
: load a buffer and replace any newline with another character (I'm using here²
) because posix sed doesn't allow[^\n]
-
/FIRST.*SECOND/!b
: if no template is present, exit without output -
s/.*²\([^²]*²[^²]*FIRST\)/\1/
: remove everything before the line before the first pattern -
:a
: label for goto (used later) -
s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/
: remove everything after the line after the second pattern. It accepts the largest string, so the last occurrence of the template is a link -
ta
: if the latters///
happens, you get a labela
. This is cyle until the first SECOND pattern appears in the file (after FIRST) -
s/²/\ /g
: return new lines -
p
: print the result
-
source to share
I would do it with Perl personally. We have a "range operator" that we can use to detect if we are between two patterns:
if ( m/FIRST/ .. /SECOND/ )
This is the easy part. What's a little less simple is to "catch" the previous and next lines. So I am setting the value $prev_line
, so when I first got into this test, I know what to print. And I clear this $prev_line
because then it is empty when I type it again, but also because then I can detect a transition at the end of the range.
So something like this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_line = " ";
while (<DATA>) {
if ( m/FIRST/ .. /SECOND/ ) {
print $prev_line;
$prev_line = '';
print;
}
else {
if ( not $prev_line ) {
print;
}
$prev_line = $_;
}
}
__DATA__
asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
source to share
This might work for you (GNU sed):
sed '/FIRST/!{h;d};H;g;:a;n;/SECOND/{n;q};$!ba' file
If the current line is not FIRST
, store it in hold space and delete the current line. If a line FIRST
appends it to the stored line, then prints both and any other lines before SECOND
, when the extra line is printed and the script exits.
source to share