How do you find the size of a bucket in riak? (in MB and ignoring backups)

I am building a node.js app using riak as my data storage solution. The app will allow some users to store data. I want some way to keep track of how much space is being used by one user (1 user -> x buckets). I also want to ignore distributed copies (only 1 instance).

I haven't been able to find anything to calculate the approximate space. Using a node.js script is ok, although I would prefer a way to do it in a database (in a distributed manner)

Does anyone have any idea of ​​the best way to do this?

+3


source to share


3 answers


As suggested in previous posts, there are two ways to do this:

  • Doing a post commit hook is the best option, if you implement it in a map / reduce job, you can use byte_size on the object content (see below)

  • Implement map / cutback work, check https://github.com/whitenode/riak_mapreduce_utils and their functionmap_datasize



erlang commit hook

update_bucket_size_hook(Object) ->
my_hooks_utils:update_bucket_size(riakc_obj:key(Object), 
  erlang:byte_size(riak_object:get_value(Object))).

      

+3


source


I'm a Riak noob, but based on what I know, my first instinct would be to look at the Post-Commit hook, where you have access to the object and properties - including the size, I believe. You can then tweak the values ​​in a separate bucket that tracks usage. Not sure if bindings before or after commit are limited to operations on the object that caused the hook. Perhaps in a post-commit hook it is possible to add a secondary index to the file-sized object in question, which you might be able to access via MapReduce in the future.

I'm sorry if maybe I'm listening ... this seems like an interesting problem, so I'm interested to see how you solve it. I meant to play with the hooks myself, but I didn't get a chance.



Fasten the hooks

0


source


The current total size of the data in the bucket (or for an arbitrary set of records) can be obtained via a mapreduce query. This will ensure the size regardless of where the records are stored and the number of copies saved. Since I couldn't find a mapreduce function that actually returns the size of the data, I created one. This is called map_datasize and can be found in my GitHub repository .

Running this mapreduce on the contents of the entire bucket is likely to be quite slow and imposes some system overhead (running mapreduce jobs on all buckets is not recommended), but could possibly be used if the size needs to be determined only occasionally.

If you always need an updated shape, I think the post-commit hook, as suggested in another post, might be the best option, although it can be a bit tricky to keep it accurate as I'm not sure if you will have access to the size of the replaced entry in updates to calculate the resize.

0


source







All Articles