Count the number of values โโof one group of columns by the value of another column
I have a text file:
asn|prefix|ip|domain
25008|85.192.184.0/21|85.192.184.59|solusi-it.com
25008|85.192.184.0/21|85.192.184.59|samtimes.ru
131755|103.31.224.0/24|103.31.224.58|karosel-ind.com
131755|103.31.224.0/24|103.31.224.58|solusi-it.com
9318|1.232.0.0/13|1.234.91.168|solusi-it.com
9318|1.232.0.0/13|1.234.91.168|es350.co.kr
Is there a way to count the number of unique ips in a unique domain using Linux Bash command and get a result like this?
domain|count_ip
solusi-it.com|3
samtimes.ru|1
karosel-ind.com|1
es350.co.kr|1
source to share
FROM perl:
perl -F'\|' -lane '
$. > 1 and $domains->{$F[3]}->{$F[2]}++;
END{
print "domain|count_ip";
print $_, "|", scalar keys %{ $domains->{$_} } for keys %$domains;
}
' file | tee new_file
The idea behind this is to use HASH HASH
$domains->{$F[3]}->{$F[2]}++
$F[3]
is the domain and $F[2]
is the IP. Uniqueness is a guarantee. The HASH key is always unique.
OUTPUT:
domain|count_ip
es350.co.kr|1
karosel-ind.com|1
samtimes.ru|1
solusi-it.com|3
source to share
Using awk:
~$ awk -F'|' 'NR>1{a[$NF]++}END{print "domain|count_ip";for (i in a){print i FS a[i]}}' f
domain|count_ip
karosel-ind.com|1
solusi-it.com|3
samtimes.ru|1
es350.co.kr|1
You can use a field separator to separate the fields with |
.
This will not check if the ip is already in the array a
.
To do this, you can use sort
to check the uniqueness of the 3rd and 4th fields:
~$ cat f f >f2
~$ sort -t'|' -k3,4 -u f2 | awk -F'|' 'NR>1{a[$NF]++}END{print "domain|count_ip";for (i in a){print i FS a[i]}}'
domain|count_ip
solusi-it.com|3
samtimes.ru|1
es350.co.kr|1
domain|1
source to share
This should do it:
cat data | tail -n+2 | awk -F'|' '{print $4" "$3}' | sort | uniq | awk '{print $1}' | uniq -c | awk '{ print $2"|"$1}'
It basically removes the header, then prints the IP and host, finds the unique {ip, host} pairs, counts them grouped by host, and formats them.
edit: corrected formatting
source to share