Can anyone explain how probabilistic counting works?

Specifically, the approach to counting the journal log.

+3


source to share


2 answers


I will try to clarify the use of probabilistic counters, although note that I am not an expert on this subject.

The goal is to count to very large numbers using only a small amount of counter storage space (for example, using a 32 bit integer).

Morris came up with the idea of ​​a "log counter", so instead of counting n, the counter has log₂ (n). In other words, given the value c of the counter, the real number represented by the counter is 2ᶜ.

Since logs usually do not have an integer value, the problem arises when the c counter needs to increment, since we can only do this in increments of 1.

The idea here is to use a "probability counter", so for each method call Increment

on our counter, we update the actual value of the counter with probability p. This is useful because it can be shown that the expected value represented by the value of counter c with probabilistic updates is in fact n. In other words, the average represented by our counter after n calls on Increment

is actually n (but at any given time, our counter probably has an error)! We trade precision for the ability to count to very large numbers with little memory (eg one register).

One scheme for achieving this, as described by Morris, is to have a counter value c representing the actual counter 2ᶜ (that is, the counter contains the log₂ of the actual count). We update this counter with probability 1/2 /, where c is the current counter value.



Note that choosing this "base" of 2 means that our actual count is always a multiple of 2 (hence the term "order of magnitude"). It is also possible to choose a different b> 1 (usually such that b <2) so that the error is less due to the possibility of counting smaller maximum numbers.

The log log comes into play because in base-2 the number x needs logarithmic bits.

There are actually many other schemes for approximating counting, and if you need such a scheme, you should probably research which one makes sense for your application.

Literature:

See Philippe Flajolet for a proof of the mean represented by a counter, or a much simpler approach to Problem 5-1 in his Introduction to Algorithms. Morris's article is usually behind paywalls, I couldn't find a free version to post here.

+6


source


its not really for a log counting approach, but I think it can help you, using Morris's algorithm, the counter is the "order of magnitude" of the actual count. The approximation is mathematically unbiased. To increment the counter, a pseudo-random event is used, so the increment is a probabilistic event. To save space, only the exponent is saved. For example, in base 2, a counter might evaluate a counter to be 1, 2, 4, 8, 16, 32, and all powers of two. The memory requirement is to just hold the exponent. As an example, to increase from 4 to 8, a pseudo-random number would be generated, so a probability of 0.25 generates a positive change in the counter. Otherwise the counter stays at level 4.from the wiki



0


source







All Articles