Converting a matrix to pairs of values with awk
I have a data matrix with latitude longitude and temperature in the following format:
15W 14.5W 14W 13.5W 13W
30N 19.3 19.3 19.2 18.9 18.6
30.5N 19.1 19 19 18.9 18.4
31N 18.9 18.8 18.7 18.6 18.3
31.5N 18.9 18.7 18.7 18.6 18.1
32N 18.6 18.5 18.6 18.5 17.5
I would like to use awk to convert it to strings with latitude longitude and temperature. The result should look like this:
15W 30N 19.3
15W 30.5N 19.1
15W 31N 18.9
15W 31.5N 18.9
15W 32N 18.6
14.5W 30N 19.3
14.5W 30.5N 19
14.5W 31N 18.8
I assume you get the idea. I thought about awk because I did something else with it and it was very powerful. But maybe some other tools will be used here.
The number of rows and columns is not always the same.
I will also need to convert latitude and longitude to decimal minutes, but I am doing it step by step.
source to share
awk one-liner (maybe a little long):
awk 'NR==1{for(i=1;i<=NF;i++)t[i]=$i}{ r[NR]=$1; for(i=2;i<=NF;i++) v[t[i-1],$1]=$i}END{for(i=1;i<=length(t);i++) for(j=2;j<=NR;j++) print t[i], r[j], v[t[i],r[j]]
} ' file
I would like to format over one layer to "three lines" :):
awk 'NR==1{for(i=1;i<=NF;i++)t[i]=$i}
{ r[NR]=$1; for(i=2;i<=NF;i++) v[t[i-1],$1]=$i}
END{for(i=1;i<=length(t);i++)for(j=2;j<=NR;j++)print t[i], r[j], v[t[i],r[j]]} ' file
Test:
kent$ cat t
15W 14.5W 14W 13.5W 13W
30N 19.3 19.3 19.2 18.9 18.6
30.5N 19.1 19 19 18.9 18.4
31N 18.9 18.8 18.7 18.6 18.3
31.5N 18.9 18.7 18.7 18.6 18.1
32N 18.6 18.5 18.6 18.5 17.5
kent$ awk 'NR==1{for(i=1;i<=NF;i++)t[i]=$i}
{ r[NR]=$1; for(i=2;i<=NF;i++) v[t[i-1],$1]=$i}
END{for(i=1;i<=length(t);i++)for(j=2;j<=NR;j++)print t[i], r[j], v[t[i],r[j]]} ' t
15W 30N 19.3
15W 30.5N 19.1
15W 31N 18.9
15W 31.5N 18.9
15W 32N 18.6
14.5W 30N 19.3
14.5W 30.5N 19
14.5W 31N 18.8
14.5W 31.5N 18.7
14.5W 32N 18.5
14W 30N 19.2
14W 30.5N 19
14W 31N 18.7
14W 31.5N 18.7
14W 32N 18.6
13.5W 30N 18.9
13.5W 30.5N 18.9
13.5W 31N 18.6
13.5W 31.5N 18.6
13.5W 32N 18.5
13W 30N 18.6
13W 30.5N 18.4
13W 31N 18.3
13W 31.5N 18.1
13W 32N 17.5
source to share
The decision doesn't have to be hard. It is really quite simple once you have chosen the correct data structure. Just use GNU awk
to use a true multidimensional array. Run as:
awk -f script.awk file
Contents script.awk
:
NR==1 {
for (i=1;i<=NF;i++) {
a[i]=$i
}
next
}
{
for (j=2;j<=NF;j++) {
b[j-1][NR]["rec"] = a[j-1] FS $1 FS $j
b[j-1][NR]["val"] = $j
}
}
END {
for (x=1;x<=length(b);x++) {
for (y=2;y<=NR;y++) {
if (b[x][y]["val"] != "999.9") {
print b[x][y]["rec"] | "column -t"
}
}
}
}
Results:
15W 30N 19.3
15W 30.5N 19.1
15W 31N 18.9
15W 31.5N 18.9
15W 32N 18.6
14.5W 30N 19.3
14.5W 30.5N 19
14.5W 31N 18.8
14.5W 31.5N 18.7
14.5W 32N 18.5
14W 30N 19.2
14W 30.5N 19
14W 31N 18.7
14W 31.5N 18.7
14W 32N 18.6
13.5W 30N 18.9
13.5W 30.5N 18.9
13.5W 31N 18.6
13.5W 31.5N 18.6
13.5W 32N 18.5
13W 30N 18.6
13W 30.5N 18.4
13W 31N 18.3
13W 31.5N 18.1
13W 32N 17.5
Alternatively, here's a one-liner:
awk 'NR==1 { for (i=1;i<=NF;i++) a[i]=$i; next } { for (j=2;j<=NF;j++) { b[j-1][NR]["rec"] = a[j-1] FS $1 FS $j; b[j-1][NR]["val"] = $j } } END { for (x=1;x<=length(b);x++) for (y=2;y<=NR;y++) if (b[x][y]["val"] != "999.9") print b[x][y]["rec"] | "column -t" }' file
source to share
awk 'NR==1{n=split($0,a," ")}NR!=1{for(i=1;i<=n;i++)x[a[i]" "$1]=$(i+1);}END{for(i in x){print i,x[i]}}' temp | sort
checked below:
> cat temp
15W 14.5W 14W 13.5W 13W
30N 19.3 19.3 19.2 18.9 18.6
30.5N 19.1 19 19 18.9 18.4
31N 18.9 18.8 18.7 18.6 18.3
31.5N 18.9 18.7 18.7 18.6 18.1
32N 18.6 18.5 18.6 18.5 17.5
phoenix.250> nawk 'NR==1{n=split($0,a," ")}NR!=1{for(i=1;i<=n;i++)x[a[i]" "$1]=$(i+1);}END{for(i in x){print i,x[i]}}' temp | sort
13.5W 30.5N 18.9
13.5W 30N 18.9
13.5W 31.5N 18.6
13.5W 31N 18.6
13.5W 32N 18.5
13W 30.5N 18.4
13W 30N 18.6
13W 31.5N 18.1
13W 31N 18.3
13W 32N 17.5
14.5W 30.5N 19
14.5W 30N 19.3
14.5W 31.5N 18.7
14.5W 31N 18.8
14.5W 32N 18.5
14W 30.5N 19
14W 30N 19.2
14W 31.5N 18.7
14W 31N 18.7
14W 32N 18.6
15W 30.5N 19.1
15W 30N 19.3
15W 31.5N 18.9
15W 31N 18.9
15W 32N 18.6
>
source to share