Column summation based on multiple column hits (some partial)
I have a file as shown below.
7404920998 May 18 04:22 20161229.data
8775804219 May 18 04:23 20161230.data
11168788265 May 17 22:07 20170103.data
9374414428 May 17 22:03 20170104.data
I want to sum column 1 based on the matched fields column2 (month) and the first four characters of column5 (year). Then print the array for each month and year.
Output:
16180725217 May 2016
20543202693 May 2017
I believe I figured out how to sum by months and years, passing the input
awk '{if($2 == "<month>" && $5 ~ /<year>/) i+=$1} END {print i, $2, substr($5, 0, 4)}' <file>
But how do I create a conditional array that meets these two conditions and outputs the desired result?
+3
pdna
source
to share
1 answer
awk
for help!
$ awk '{a[$2 FS substr($5,1,4)]+=$1}
END {for(k in a) print a[k],k}' file | sort -k3n -k2,2M
16180725217 May 2016
20543202693 May 2017
+3
karakfa
source
to share