Awk search column from one file if match print columns from both files
I am trying to compare column 1 to file 1 and column 3 from file 2 if they are the same, then print the first column from file1 and the first two columns from file2.
here's a sample from each file:
file1
Cre01.g000100
Cre01.g000500
Cre01.g000650
file2
chromosome_1 71569 |655|Cre01.g000500|protein_coding|CODING|PAC:26902937|1|1)
chromosome_1 93952 |765|Cre01.g000650|protein_coding|CODING|PAC:26903448|11|1)
chromosome_1 99034 |1027|Cre01.g000100 |protein_coding|CODING|PAC:26903318|9|1)
desired output
Cre01.g000100 chromosome_1 99034
Cre01.g000500 chromosome_1 71569
Cre01.g000650 chromosome_1 93952
I've looked at various streams that are somewhat similar, but I can't seem to get it to print columns from both files. Here are a few links that are somewhat related:
Get templates from file, compare against column of another file, print matching lines using awk
awk compare columns with two files, enter the values โโof another column
Get templates in one file from another using ack or awk, or is it better than grep?
Awk - concatenate data from 2 files and print to the 3rd file, if appropriate keys
I feel like I should have figured it out based on these threads, but there were two days where I tried different variations of the codes and I didn't get anything. Here is some code I tried to use in my files:
awk 'FNR==NR{a[$3]=$1;next;}{print $0 ($3 in a ? a[$3]:"NA")}' file1 file2
awk 'NR==FNR{ a[$1]; next} ($3 in a) {print $1 $2 a[$1]}' file1 file2
awk 'FNR==NR{a[$1]=$0; next}{print a[$1] $0}' file1 file2
I know that I need to create a temporary matrix containing the first column of file1 (or the third column of file2) and then compare it to another file. If there is a match, print the first column from file1 and column 1 and column 2 from file 2.
Thanks for the help!
source to share
Your average try of the three is the closest, but:
- You have not specified a field separator
|
. - You don't assign
a[$1]
. -
The fetch result does not match your desired result (the sample output shows column 1 from file 1 and column 1 from file 2, the desired result is columns 1 from file 1 and columns 1 and 2 from file 2, although this interpretation depends on the interpretation
$3
in file 2 which is the name between the two pipe characters).Referring to the question at the time this answer was created:
... compare column 1 to file 1 and column 3 from file 2 if they match, then print the first column from file1 and the first two columns from file2.
desired output Cre01.g000100 chromosome_1 99034 Cre01.g000500 chromosome_1 71569 Cre01.g000650 chromosome_1 93952
-
We can notice that if
$3
in file 2 is equal to the value from file 1, then it is also easy to print as the stored value$3
.
So the fix for this:
awk -F'|' 'NR==FNR { a[$1]=1; next } ($3 in a) { print $3, $1 }' file1 file2
The key change is the assignment a[$1]
(s -F'|'
); the rest is cosmetic and can be changed to suit your requirements (since the question is inconsistent, it is difficult to give a better answer).
source to share