Matching the first column of a file with awk, difficulty with quotes
My input file looks like
Chr1 1
Chr1 2
Chr2 3
And I want to split the input file into multiple files according to the Chr in the first column.
There should be two output files Output file 1 (named tmpChr1):
Chr1 1
Chr1 2
Output file 2 (named tmpChr2):
Chr2 3
Here's the code:
#!/bin/bash
for((chrom=1;chrom<30;chrom++)); do
echo Chr${chrom}
chr=Chr${chrom}
awk "\$1==$chr{print \$1}" input.txt > tmp$chr
done
The line awk "\$1==$chr{print \$1}"
is the problem, awk seems to require quotes around $ chr in order to match $ 1 correctly
awk '$1=="Chr1"{print $1}'
works and tmpChr1 is done
awk '$1=="$chr"{print $1}'
doesn't work either
and awk "$1=='$chr'{print $1}"
Really struggling with quotes, can anyone shed some light on what I should be doing?
source to share
Never use double quotes around an awk script, and never allow shell variables to expand as part of the body of an awk script. See http://cfajohnson.com/shell/cus-faq-2.html#Q24
You are still an adversary with your general approach. All you need is an awk script:
awk '{print > ("tmp"$1)}' file
Appearance:
$ ls
file
$ cat file
Chr1 1
Chr1 2
Chr2 3
$ awk '{print > ("tmp"$1)}' file
$ ls
file tmpChr1 tmpChr2
$ cat tmpChr1
Chr1 1
Chr1 2
$ cat tmpChr2
Chr2 3
Every time you write a loop in the shell, just to manipulate the text you have the wrong approach. The UNIX shell is the environment from which to invoke language tools to sequence these calls. UNIX text manipulation tool - awk. So if you need to manipulate text on UNIX, write an awk script and call it from the shell, that's all.
source to share