Column summation based on multiple column hits (some partial)

I have a file as shown below.

7404920998 May 18 04:22 20161229.data
8775804219 May 18 04:23 20161230.data
11168788265 May 17 22:07 20170103.data
9374414428 May 17 22:03 20170104.data

      

I want to sum column 1 based on the matched fields column2 (month) and the first four characters of column5 (year). Then print the array for each month and year.

Output:

16180725217 May 2016
20543202693 May 2017

      

I believe I figured out how to sum by months and years, passing the input

awk '{if($2 == "<month>" && $5 ~ /<year>/) i+=$1} END {print i, $2, substr($5, 0, 4)}' <file>

      

But how do I create a conditional array that meets these two conditions and outputs the desired result?

+3


source to share


1 answer


awk

for help!



$ awk   '{a[$2 FS substr($5,1,4)]+=$1} 
     END {for(k in a) print a[k],k}' file | sort -k3n -k2,2M

16180725217 May 2016
20543202693 May 2017

      

+3


source







All Articles