Grep matching a specific position in strings using words from another file

I have 2 files

file1:

12342015010198765hello
12342015010188765hello
12342015010178765hello

      

each line of which contains fields in fixed positions, for example, a position 13 - 17

foraccount_id

file2:

98765
88765

      

which contains a list of account_id

s.

In Korn shell, I want to print lines from file1, position 13 - 17

matches one of account_id

in file2.

I can not do

grep -f file2 file1

      

because account_id

in file2 may match other fields in different positions.

I tried using template in file2:

^.{12}98765.*

      

but doesn't work.

+3


source to share


2 answers


Using awk

$ awk 'NR==FNR{a[$1]=1;next;} substr($0,13,5) in a' file2 file1
12342015010198765hello
12342015010188765hello

      

How it works

  • NR==FNR{a[$1]=1;next;}

    FNR is the number of lines read so far from the current file, and NR is the total number of lines read so far. Thus, if FNR==NR

    , we read the first file, which is file2

    .

    Each identifier in file2 is stored in an array a

    . Then we will skip the rest of the commands and move on to the line next

    .

  • substr($0,13,5) in a

    If we reach this command, we are working on the second file file1

    .

    This condition is true if the 5-character substring starting at position 13 is in the array a

    . If the condition is true, awk performs the default action, which is to print the string.

Using grep

You mentioned trying



grep '^.{12}98765.*' file2

      

This uses the extended regex syntax, which means it is required -E

. Also, there is no value in the match .*

at the end: it will always match. Thus, try:

$ grep -E '^.{12}98765' file1
12342015010198765hello

      

To get both lines:

$ grep -E '^.{12}[89]8765' file1
12342015010198765hello
12342015010188765hello

      

This works because it [89]8765

just matches the IDs of interest in file2. Of course, awk's solution provides more flexibility when it comes to comparing identifiers.

+2


source


Usage sed

with extended regex:

sed -r 's@.*@/^.{12}&/p@' file2 |sed -nr -f- file1

      

Using a basic regex:

sed 's@.*@/^.\\{12\\}&/p@' file1 |sed -n -f- file

      

Explanation:



sed -r 's@.*@/^.{12}&/p@' file2

      

will generate output:

/.{12}98765/p
/.{12}88765/p

      

which is then used as a sed

script for the next one sed

after the pipe, which outputs:

12342015010198765hello
12342015010188765hello

      

+1


source







All Articles