Count the number of values of one group of columns by the value of another column

Question

Count the number of values of one group of columns by the value of another column

I have a text file:

asn|prefix|ip|domain
25008|85.192.184.0/21|85.192.184.59|solusi-it.com
25008|85.192.184.0/21|85.192.184.59|samtimes.ru
131755|103.31.224.0/24|103.31.224.58|karosel-ind.com
131755|103.31.224.0/24|103.31.224.58|solusi-it.com
9318|1.232.0.0/13|1.234.91.168|solusi-it.com
9318|1.232.0.0/13|1.234.91.168|es350.co.kr

Is there a way to count the number of unique ips in a unique domain using Linux Bash command and get a result like this?

domain|count_ip
solusi-it.com|3
samtimes.ru|1
karosel-ind.com|1
es350.co.kr|1

+3

linux bash count

UserYmY 06 jan. 15 at 16:21

source to share

3 answers

Using awk:

~$ awk -F'|' 'NR>1{a[$NF]++}END{print "domain|count_ip";for (i in a){print i FS a[i]}}' f
domain|count_ip
karosel-ind.com|1
solusi-it.com|3
samtimes.ru|1
es350.co.kr|1

You can use a field separator to separate the fields with |

.
This will not check if the ip is already in the array a

.

To do this, you can use sort

to check the uniqueness of the 3rd and 4th fields:

~$ cat f f >f2
~$ sort -t'|' -k3,4 -u f2 | awk -F'|' 'NR>1{a[$NF]++}END{print "domain|count_ip";for (i in a){print i FS a[i]}}'
domain|count_ip
solusi-it.com|3
samtimes.ru|1
es350.co.kr|1
domain|1

+2

fredtantini 06 jan. 15 at 16:27

source to share

This should do it:

 cat data | tail -n+2 | awk -F'|' '{print  $4" "$3}' | sort | uniq | awk '{print $1}' | uniq -c | awk '{ print $2"|"$1}'

It basically removes the header, then prints the IP and host, finds the unique {ip, host} pairs, counts them grouped by host, and formats them.

edit: corrected formatting

+1

lared 06 jan. 15 at 16:27

source to share

Gilles quenot · Accepted Answer · 2015-01-06T16:30:13+0000

FROM perl:

perl -F'\|' -lane '                                                            
    $. > 1 and $domains->{$F[3]}->{$F[2]}++;
    END{
        print "domain|count_ip";
        print $_, "|", scalar keys %{ $domains->{$_} } for keys %$domains;
    }
' file | tee new_file

The idea behind this is to use HASH HASH

$domains->{$F[3]}->{$F[2]}++

$F[3]

is the domain and $F[2]

is the IP. Uniqueness is a guarantee. The HASH key is always unique.

OUTPUT:

domain|count_ip
es350.co.kr|1
karosel-ind.com|1
samtimes.ru|1
solusi-it.com|3

Count the number of values ​​of one group of columns by the value of another column

OUTPUT:

More articles:

Count the number of values of one group of columns by the value of another column