Awk - concatenate data from 2 files and print to the 3rd file, if appropriate keys
Scenario:
- I am trying to write an Awk script.
- I have two files. File1 (tab delimited), file2 (lines).
- In File1, I have a combination of Field4 + Field3 + Field2 line 01 to make a reference key to field 1 of lines in file2.
- I can match and extract information, but not in a good format
Demand
- I want to print information in File3.txt if a reference key is matched. I need to print to File3 where the format in case is
- KEY MATCHED: line 01 -07 from File1, followed by a match line from File2 with a prefix 77, etc.
- KEY NOT MATCHED: If the keys do not match then set all unmatched records from file2 with prefix 99 only.
Script:
awk -F'\t' -v OFS'\t' 'FNR==NR{a[substr($0,1,8)]=$4$3$2}
{if ($4$3$2 in a) printf ("77""\t"); else printf ("99""\t");print $0}' \
File2.txt File1.txt > File3.txt
File1:
01 89 68 5000
02 89 11
03 89 00
06 89 00
07 89 19 RT 0428
01 87 23 5100
02 87 11
04 87 9 02
03 87 00
06 87 00
07 87 11 RT 0428
01 83 23 4900
02 83 11
04 83 9 02
03 83 00
06 83 00
07 83 11 RT 0428
File2:
50006889 CCARD /3010 /E /C A87545457 / // ///11 ///
51002387 CCARD /3000 /E /S N054896334IV / // ///11 ///
51002390800666 CCARD /3000 /E /S N0978898IV / // ///11 ///
File 3: Current Output
99 50006889 CCARD /3010 /E /C A87545457 / // ///11 ///
99
99 51002387 CCARD /3000 /E /S N054896334IV / // ///11 ///
99
99 51002390800666 CCARD /3000 /E /S N0978898IV / // ///11 ///
77 01 89 68 5000
99 02 89 11
99 03 89 00
99 06 89 00
99 07 89 19 RT
77 01 87 23 5100
99 02 87 11
99 04 87 9 02
99 03 87 00
99 06 87 00
99 07 87 11 RT 0428
99 01 83 23 4900
99 83 11
99 83 9 02
99 83 00
99 83 00
99 83 11 RT 0428
Desired output:
01 89 68 5000
02 89 11
03 89 00
06 89 00
07 89 19 RT 0428
77 50006889 CCARD /3010 /E /C A87545457 / // ///11 ///
01 87 23 5100
02 87 11
04 87 9 02
03 87 00
06 87 00
07 87 11 RT 0428
77 51002387 CCARD /3000 /E /S N054896334IV / // ///11 ///
99 51002390800666 CCARD /3000 /E /S N0978898IV / // ///11 ///
Actually for 77 and 99 lines I need the whole line, but 77 and 99 only at the beginning of the matching key. Currently, if the line is long and wraps to the second line, the script puts 77 and 99 before the second line I'm looking to put 77 and 99 before the matching code.
For example, the following is the output from Jonathan's corrected awk code:
$ awk -f awk.script File2.txt File1.txt
01 89 68 5000
02 89 11
03 89 00
06 89 00
07 89 19 RT 0428
77 50006889 CCARD /3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ///
01 87 23 5100
02 87 11
04 87 9 02
03 87 00
06 87 00
07 87 11 RT 0428
77 51002387 CCARD /3000 /E /S N054896334IV / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / //
77 ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ////3010 /E /C A87545457 / // ///11 ///
01 83 23 4900
02 83 11
04 83 9 02
03 83 00
06 83 00
07 83 11 RT 0428
99 51002390800666 CCARD /3000 /E /S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ///////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ///////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ///////S N0978898IV
99 / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ///////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ///////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ///////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ///////S N0978898IV / // ///11 ////S N09
99 78898IV / // ///11 ////S N0978898IV / // ///11 ////S N0978898IV / // ///11 ///
$
source to share
You read it right File2.txt
before reading File1.txt
. However, you need to ignore blank lines in File2.txt
.
FNR == NR && ! /^[[:space:]]*$/ { key = substr($1, 1, 8); a[key] = $0; next }
The first 8 characters of the first field are used as a key, and the entire string is used as a value. next
ensures that lines are not handled differently.
The next part is awkward. You need to define strings using 01
in $1
and build a key from that. The next time you get the line 01
, you need to print the line with the 77s prefix a
(and remove the entry from a
).
At the end, you need to print the line prefixed with 77 from a
(and remove the entry from a
). Then you need to process any records left in a
and give them a 99-prefix.
$1 == "01" { if (code != 0)
{
if (code in a)
{
printf("77\t%s\n", a[code])
delete a[code]
}
}
code = $4$3$2
}
{ print }
END {
if (code in a)
{
printf("77\t%s\n", a[code])
delete a[code]
}
for (code in a)
printf("99\t%s\n", a[code])
}
It's clear that you can use fewer spaces than I do, but you may need to add some semicolons. For testing, I put the code above into a file awk.script
and run:
$ awk -f awk.script File2.txt File1.txt
01 89 68 5000
02 89 11
03 89 00
06 89 00
07 89 19 RT 0428
77 50006889 CCARD /3010 /E /C A87545457 / // ///11 ///
01 87 23 5100
02 87 11
04 87 9 02
03 87 00
06 87 00
07 87 11 RT 0428
77 51002387 CCARD /3000 /E /S N054896334IV / // ///11 ///
01 83 23 4900
02 83 11
04 83 9 02
03 83 00
06 83 00
07 83 11 RT 0428
99 51002390800666 CCARD /3000 /E /S N0978898IV / // ///11 ///
$
This looks like what you wanted. If you need a blank line after the previous block of output, add printf("\n")
after the blocks if
that print the lines prefixed with 77. You can write it to File3.txt
, if you like, with I / O redirection. You can embed the script in single quotes and add it to the command line instead -f awk.script
. You can take the whole script down one line if you like, but please don't do that; it's too big to make a good one-line font, and that's the name of the program awk
, not apl
.
source to share