Use cases for Redis SETBIT, GETBIT, BITCOUNT?

After reading Can anyone explain the redis setbit command?

and http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/ (link in redis docs)

I am still trying to figure out the use cases for SETBIT

over SET

. The above sources seem to cite a driving factor for using SETBIT

binary data to store events and "countable" datasets, as this helps to significantly reduce the amount of data that needs to be stored while maintaining ease of access.

Stores daily unique visits to a website by user ID (identified by an offset from 0) in a bitmap file 100000001

where users with ID

0

and 8

are the only ones who have a visit - better than just setting a timestamp: userID? Please explain. Thank.

My apologies for being such a neophyte question.

+3


source to share


2 answers


Answer: it depends. In the example above, it depends, for example, on the number of logins you have per day (how many bits are active in the bitmask). If you have, for example, 2 logins or random user IDs, it's better to just keep a list of logins.

But if you have an active userbase and 60% of all users are active .. it turns out that you need to store 1 bit (actually less than the average, because redis only keeps the bitmask until the oldest is reached bit (1) is achieved much more conveniently for storing data than storing identifiers in a list. Storing identifiers in a list would result in, for example, 32 bits (an integer) being used to represent 1-bit information, which is useless. It could even be more, if the list uses the concept of a tree with explicit pointers to related nodes Due to the fact that we are expensive / limited RAM and we want everything to be scalable, we need to strive for minimal memory usage while still meeting all the query requirements.

So this is what I would decide to use in the use case.



However, the use of bitmasks makes it possible to pump huge amounts of data very quickly. Let's say you store 2 bitmaks: 1 is logged in InToday, 1 is subscribed by UpForNewsletter. Using a bitoperation like AND (processors can do these operations very quickly), you can suddenly filter out all user IDs (represented by bit position 1) who are logged in today and subscribed to the newsletter. Since bitmask intersections can be performed at least one magnitude faster than two ordered lists of IDs, you can suddenly perform this operation on millions of users and still remain below 50ms.

To wrap up my answer: using bitmasks allows for some real-time analytics that would not otherwise be real, and can save you a lot of memory if you expect a lot of items in a list. Note that this is just one use, there are many others (like bloom filters).

+2


source


Bits are the basic units of data that computers use, and the Redis' BIT * command makes it easy to manipulate bit values. In the example the OP provided, using a bitstream would mostly result in savings in terms of space.



It will cost (at least) about 10 bytes to store the key for each entry, whereas the bitstream would only need 1 bit per user.

+2


source







All Articles