Awk - concatenate data from 2 files and print to the 3rd file, if appropriate keys

Question

Awk - concatenate data from 2 files and print to the 3rd file, if appropriate keys

Scenario:

I am trying to write an Awk script.
I have two files. File1 (tab delimited), file2 (lines).
In File1, I have a combination of Field4 + Field3 + Field2 line 01 to make a reference key to field 1 of lines in file2.
I can match and extract information, but not in a good format

Demand

I want to print information in File3.txt if a reference key is matched. I need to print to File3 where the format in case is
KEY MATCHED: line 01 -07 from File1, followed by a match line from File2 with a prefix 77, etc.
KEY NOT MATCHED: If the keys do not match then set all unmatched records from file2 with prefix 99 only.

Script:

awk -F'\t' -v OFS'\t' 'FNR==NR{a[substr($0,1,8)]=$4$3$2}
     {if ($4$3$2 in a) printf ("77""\t"); else printf ("99""\t");print $0}' \
     File2.txt File1.txt > File3.txt

File1:

01  89  68  5000
02  89  11
03  89  00
06  89  00
07  89  19  RT  0428
01  87  23  5100
02  87  11
04  87  9   02
03  87  00
06  87  00
07  87  11  RT  0428
01  83  23  4900
02  83  11
04  83  9   02
03  83  00
06  83  00
07  83  11  RT  0428

File2:

50006889 CCARD /3010  /E     /C A87545457          /  //                ///11        ///

51002387 CCARD /3000  /E     /S N054896334IV          /  //                ///11        ///

51002390800666 CCARD /3000  /E     /S N0978898IV          /  //                ///11        ///

File 3: Current Output

99  50006889 CCARD /3010  /E     /C A87545457          /  //                ///11        ///
99
99  51002387 CCARD /3000  /E     /S N054896334IV          /  //                ///11        ///
99
99  51002390800666 CCARD /3000  /E     /S N0978898IV          /  //                ///11        ///
77  01  89  68  5000
99  02  89  11
99  03  89  00
99  06  89  00
99  07  89  19  RT
77  01  87  23  5100
99  02  87  11
99  04  87  9   02
99  03  87  00
99  06  87  00
99  07  87  11  RT  0428    
99  01  83  23  4900
99  83  11
99  83  9   02
99  83  00
99  83  00
99  83  11  RT  0428

Desired output:

01  89  68  5000
02  89  11
03  89  00
06  89  00
07  89  19  RT  0428
77  50006889 CCARD /3010  /E     /C A87545457          /  //                ///11        ///

01  87  23  5100
02  87  11
04  87  9   02
03  87  00
06  87  00
07  87  11  RT  0428
77  51002387 CCARD /3000  /E     /S N054896334IV          /  //                ///11        ///

99  51002390800666 CCARD /3000  /E     /S N0978898IV          /  //                ///11        ///

Actually for 77 and 99 lines I need the whole line, but 77 and 99 only at the beginning of the matching key. Currently, if the line is long and wraps to the second line, the script puts 77 and 99 before the second line I'm looking to put 77 and 99 before the matching code.

For example, the following is the output from Jonathan's corrected awk code:

$ awk -f awk.script File2.txt File1.txt

        01  89  68  5000
        02  89  11
        03  89  00
        06  89  00
        07  89  19  RT  0428
        77  50006889 CCARD /3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ///
        01  87  23  5100
        02  87  11
        04  87  9   02
        03  87  00
        06  87  00
        07  87  11  RT  0428
        77  51002387 CCARD /3000  /E     /S N054896334IV          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //      
77          ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ////3010  /E     /C A87545457          /  //                ///11        ///
        01  83  23  4900
        02  83  11
        04  83  9   02
        03  83  00
        06  83  00
        07  83  11  RT  0428
        99  51002390800666 CCARD /3000  /E     /S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ///////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ///////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ///////S N0978898IV    
    99        /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ///////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ///////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ///////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ///////S N0978898IV          /  //                ///11        ////S N09
    99  78898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ////S N0978898IV          /  //                ///11        ///
        $

+1

awk

HighTech 21 jan. '15 at 2:50

source to share

1 answer

Jonathan Leffler · Accepted Answer · 2015-01-21T04:32:03+0000

You read it right File2.txt

before reading File1.txt

. However, you need to ignore blank lines in File2.txt

.

FNR == NR && ! /^[[:space:]]*$/ { key = substr($1, 1, 8); a[key] = $0; next }

The first 8 characters of the first field are used as a key, and the entire string is used as a value. next

ensures that lines are not handled differently.

The next part is awkward. You need to define strings using 01

in $1

and build a key from that. The next time you get the line 01

, you need to print the line with the 77s prefix a

(and remove the entry from a

).

At the end, you need to print the line prefixed with 77 from a

(and remove the entry from a

). Then you need to process any records left in a

and give them a 99-prefix.

$1 == "01" { if (code != 0)
             {
                 if (code in a)
                 {
                     printf("77\t%s\n", a[code])
                     delete a[code]
                 }
             }
             code = $4$3$2
           }
{ print }
END {
         if (code in a)
         {
             printf("77\t%s\n", a[code])
             delete a[code]
         }
         for (code in a)
             printf("99\t%s\n", a[code])
    }

It's clear that you can use fewer spaces than I do, but you may need to add some semicolons. For testing, I put the code above into a file awk.script

and run:

$ awk -f awk.script File2.txt File1.txt
01  89  68  5000
02  89  11
03  89  00
06  89  00
07  89  19  RT  0428
77  50006889 CCARD /3010  /E     /C A87545457          /  //                ///11        ///
01  87  23  5100
02  87  11
04  87  9   02
03  87  00
06  87  00
07  87  11  RT  0428
77  51002387 CCARD /3000  /E     /S N054896334IV          /  //                ///11        ///
01  83  23  4900
02  83  11
04  83  9   02
03  83  00
06  83  00
07  83  11  RT  0428
99  51002390800666 CCARD /3000  /E     /S N0978898IV          /  //                ///11        ///
$

This looks like what you wanted. If you need a blank line after the previous block of output, add printf("\n")

after the blocks if

that print the lines prefixed with 77. You can write it to File3.txt

, if you like, with I / O redirection. You can embed the script in single quotes and add it to the command line instead -f awk.script

. You can take the whole script down one line if you like, but please don't do that; it's too big to make a good one-line font, and that's the name of the program awk

, not apl

.

Awk - concatenate data from 2 files and print to the 3rd file, if appropriate keys

More articles: