Grep matching a specific position in strings using words from another file
I have 2 files
file1:
12342015010198765hello 12342015010188765hello 12342015010178765hello
each line of which contains fields in fixed positions, for example, a position 13 - 17
foraccount_id
file2:
98765 88765
which contains a list of account_id
s.
In Korn shell, I want to print lines from file1, position 13 - 17
matches one of account_id
in file2.
I can not do
grep -f file2 file1
because account_id
in file2 may match other fields in different positions.
I tried using template in file2:
^.{12}98765.*
but doesn't work.
source to share
Using awk
$ awk 'NR==FNR{a[$1]=1;next;} substr($0,13,5) in a' file2 file1
12342015010198765hello
12342015010188765hello
How it works
-
NR==FNR{a[$1]=1;next;}
FNR is the number of lines read so far from the current file, and NR is the total number of lines read so far. Thus, if
FNR==NR
, we read the first file, which isfile2
.Each identifier in file2 is stored in an array
a
. Then we will skip the rest of the commands and move on to the linenext
. -
substr($0,13,5) in a
If we reach this command, we are working on the second file
file1
.This condition is true if the 5-character substring starting at position 13 is in the array
a
. If the condition is true, awk performs the default action, which is to print the string.
Using grep
You mentioned trying
grep '^.{12}98765.*' file2
This uses the extended regex syntax, which means it is required -E
. Also, there is no value in the match .*
at the end: it will always match. Thus, try:
$ grep -E '^.{12}98765' file1
12342015010198765hello
To get both lines:
$ grep -E '^.{12}[89]8765' file1
12342015010198765hello
12342015010188765hello
This works because it [89]8765
just matches the IDs of interest in file2. Of course, awk's solution provides more flexibility when it comes to comparing identifiers.
source to share
Usage sed
with extended regex:
sed -r 's@.*@/^.{12}&/p@' file2 |sed -nr -f- file1
Using a basic regex:
sed 's@.*@/^.\\{12\\}&/p@' file1 |sed -n -f- file
Explanation:
sed -r 's@.*@/^.{12}&/p@' file2
will generate output:
/.{12}98765/p /.{12}88765/p
which is then used as a sed
script for the next one sed
after the pipe, which outputs:
12342015010198765hello 12342015010188765hello
source to share