Awk search column from one file if match print columns from both files

I am trying to compare column 1 to file 1 and column 3 from file 2 if they are the same, then print the first column from file1 and the first two columns from file2.

here's a sample from each file:

file1

Cre01.g000100   
Cre01.g000500  
Cre01.g000650  

      

file2

chromosome_1    71569  |655|Cre01.g000500|protein_coding|CODING|PAC:26902937|1|1)
chromosome_1    93952  |765|Cre01.g000650|protein_coding|CODING|PAC:26903448|11|1)
chromosome_1    99034  |1027|Cre01.g000100 |protein_coding|CODING|PAC:26903318|9|1)

      

desired output

Cre01.g000100  chromosome_1    99034        
Cre01.g000500  chromosome_1    71569   
Cre01.g000650  chromosome_1    93952

      

I've looked at various streams that are somewhat similar, but I can't seem to get it to print columns from both files. Here are a few links that are somewhat related:

awk compares 2 files, 2 fields of different order in a file, print or match matches and no lines match

Get templates from file, compare against column of another file, print matching lines using awk

awk compare columns with two files, enter the values โ€‹โ€‹of another column

Get templates in one file from another using ack or awk, or is it better than grep?

Awk - concatenate data from 2 files and print to the 3rd file, if appropriate keys

I feel like I should have figured it out based on these threads, but there were two days where I tried different variations of the codes and I didn't get anything. Here is some code I tried to use in my files:

awk 'FNR==NR{a[$3]=$1;next;}{print $0 ($3 in a ? a[$3]:"NA")}' file1 file2

awk 'NR==FNR{ a[$1]; next} ($3 in a) {print $1 $2 a[$1]}' file1 file2

awk 'FNR==NR{a[$1]=$0; next}{print a[$1] $0}' file1 file2

      

I know that I need to create a temporary matrix containing the first column of file1 (or the third column of file2) and then compare it to another file. If there is a match, print the first column from file1 and column 1 and column 2 from file 2.

Thanks for the help!

+3


source to share


2 answers


You can use this one awk

:



awk -F '[| ]+' -v OFS='\t' 'NR==FNR{a[$4]=$1 OFS $2; next}
       $1 in a{print $1, a[$1]}' file2 file1
Cre01.g000100   chromosome_1    99034
Cre01.g000500   chromosome_1    71569
Cre01.g000650   chromosome_1    93952

      

0


source


Your average try of the three is the closest, but:

  • You have not specified a field separator |

    .
  • You don't assign a[$1]

    .
  • The fetch result does not match your desired result (the sample output shows column 1 from file 1 and column 1 from file 2, the desired result is columns 1 from file 1 and columns 1 and 2 from file 2, although this interpretation depends on the interpretation $3

    in file 2 which is the name between the two pipe characters).

    Referring to the question at the time this answer was created:

    ... compare column 1 to file 1 and column 3 from file 2 if they match, then print the first column from file1 and the first two columns from file2.

    desired output
    Cre01.g000100  chromosome_1    99034
    Cre01.g000500  chromosome_1    71569
    Cre01.g000650  chromosome_1    93952
    
          

  • We can notice that if $3

    in file 2 is equal to the value from file 1, then it is also easy to print as the stored value $3

    .

So the fix for this:



awk -F'|' 'NR==FNR { a[$1]=1; next } ($3 in a) { print $3, $1 }' file1 file2

      

The key change is the assignment a[$1]

(s -F'|'

); the rest is cosmetic and can be changed to suit your requirements (since the question is inconsistent, it is difficult to give a better answer).

0


source







All Articles