Probability Distribution of each unique number in the array (length unknown) after excluding zeros

Question

Probability Distribution of each unique number in the array (length unknown) after excluding zeros

Part of my data file looks like

ifile.txt
1
1
3
0
6
3
0
3
3
5

I would like to find the probability of every number excluding zeros. for example P (1) = 2/8; P (3) = 4/8, etc.

Desired output

ofile.txt
1  0.250
3  0.500
5  0.125
6  0.125

Where 1st column shows unique numbers except 0 and 2nd column shows probability. I tried to follow but looks like a very long idea. I ran into a problem for the for loop as there are so many unique numbers

n=$(awk '$1 > 0 {print $0}' ifile.txt | wc -l)
for i in 1 3 5 6 .....
do
n1=$(awk '$1 == $i {print $0}' ifile.txt | wc -l)
p=$(echo $n1/$n | bc -l)
printf "%d %.3f\n" "$i $p" >> ofile.txt
done

+3

linux unix shell awk probability-density

Kay Jul 17 15 at 2:22 am

source to share

3 answers

How about sort | uniq -c

to get a numeric number at ~ n log n instead of n ^ 2 times and then run it through division by your total non-zero count from wc -l

?

+3

Novelocrat Jul 17 15 at 2:33

source to share

Novelocrat's suggestion can be used heresort|uniq -c

:

sed '/^0/ d' ifile.txt|sort|uniq -c >i
awk 'FNR==NR{n+=$1;next;}{print $2,$1/n}' i i

short description

remove numbers starting with 0 sed '/^0/ d' ifile.txt

sort|uniq -c >i

gives you i

:

In awk, FNR==NR{n+=$1;next;}

totals col 1 from i

to n

(will next

skip next command) and then print $2,$1/n

prints col 2 from i

and col 1 above n

.

+3

snd Jul 17 15 at 6:08

source to share

Barmar · Accepted Answer · 2015-07-17T02:42:27+0000

Use an associative array in awk

to get the count of each unique number in a single pass.

awk '$0 != "0" { count[$0]++; total++ } 
     END { for(i in count) printf("%d %.3f\n", i, count[i]/total) }' ifile.txt | sort -n > ofile.txt

Probability Distribution of each unique number in the array (length unknown) after excluding zeros

More articles: