Probability Distribution of each unique number in the array (length unknown) after excluding zeros
Part of my data file looks like
ifile.txt
1
1
3
0
6
3
0
3
3
5
I would like to find the probability of every number excluding zeros. for example P (1) = 2/8; P (3) = 4/8, etc.
Desired output
ofile.txt
1 0.250
3 0.500
5 0.125
6 0.125
Where 1st column shows unique numbers except 0 and 2nd column shows probability. I tried to follow but looks like a very long idea. I ran into a problem for the for loop as there are so many unique numbers
n=$(awk '$1 > 0 {print $0}' ifile.txt | wc -l)
for i in 1 3 5 6 .....
do
n1=$(awk '$1 == $i {print $0}' ifile.txt | wc -l)
p=$(echo $n1/$n | bc -l)
printf "%d %.3f\n" "$i $p" >> ofile.txt
done
source to share
Novelocrat's suggestion can be used heresort|uniq -c
:
sed '/^0/ d' ifile.txt|sort|uniq -c >i
awk 'FNR==NR{n+=$1;next;}{print $2,$1/n}' i i
short description
remove numbers starting with 0 sed '/^0/ d' ifile.txt
sort|uniq -c >i
gives you i
:
2 1
4 3
1 5
1 6
In awk, FNR==NR{n+=$1;next;}
totals col 1 from i
to n
(will next
skip next command) and then print $2,$1/n
prints col 2 from i
and col 1 above n
.
source to share