Matching the first column of a file with awk, difficulty with quotes

Question

Matching the first column of a file with awk, difficulty with quotes

My input file looks like

Chr1 1
Chr1 2
Chr2 3

And I want to split the input file into multiple files according to the Chr in the first column.

There should be two output files Output file 1 (named tmpChr1):

Chr1 1
Chr1 2

Output file 2 (named tmpChr2):

Chr2 3

Here's the code:

#!/bin/bash

for((chrom=1;chrom<30;chrom++)); do
echo Chr${chrom}
chr=Chr${chrom}
awk "\$1==$chr{print \$1}" input.txt > tmp$chr
done

The line awk "\$1==$chr{print \$1}"

is the problem, awk seems to require quotes around $ chr in order to match $ 1 correctly

awk '$1=="Chr1"{print $1}'

works and tmpChr1 is done

awk '$1=="$chr"{print $1}'

doesn't work either

and awk "$1=='$chr'{print $1}"

Really struggling with quotes, can anyone shed some light on what I should be doing?

+3

bash awk

SonicProtein May 04 '15 at 21:18

source to share

1 answer

Ed morton · Accepted Answer · 2015-05-04T21:22:16+0000

Never use double quotes around an awk script, and never allow shell variables to expand as part of the body of an awk script. See http://cfajohnson.com/shell/cus-faq-2.html#Q24

You are still an adversary with your general approach. All you need is an awk script:

awk '{print > ("tmp"$1)}' file

Appearance:

$ ls
file
$ cat file
Chr1 1
Chr1 2
Chr2 3
$ awk '{print > ("tmp"$1)}' file
$ ls
file  tmpChr1  tmpChr2
$ cat tmpChr1
Chr1 1
Chr1 2
$ cat tmpChr2
Chr2 3

Every time you write a loop in the shell, just to manipulate the text you have the wrong approach. The UNIX shell is the environment from which to invoke language tools to sequence these calls. UNIX text manipulation tool - awk. So if you need to manipulate text on UNIX, write an awk script and call it from the shell, that's all.

Matching the first column of a file with awk, difficulty with quotes

More articles: