Part size of aerosol aggregation

We use Aerospike's "small" storage-only server to store website analytics for the last hour. The data size for the last hour is about 10 GB.

We tried to do some aggregation requests from a separate server (Java client) on Aerospike, something like this (in LUA):

stream : aggregate( map(), complex_aggregate_function ) : reduce( simple_reduce_function )

      

According to the documentation, all aggregations are performed on aerospace nodes (in our case, one node) and a downstream client.

It turns out that the aggregate () function only processes a small batch of data, i.e. 10-16 records. After that, the aggregation result is sent to the client for processing by the reduce () method.

Since the reduce () operation is performed on the client, this means that the server will send at least 1/16 of the data size to the client. That is, hundreds of megabytes for our data. Talk about performance.

Can I change the "buffer size" or "queue size" or "any size" for aggregating streams of records? That is, is it possible to "tune" Aerospike to call reduce () only once per node?

+3


source to share


1 answer


There are two aspects to this problem: the size of the request packet and the size of the request buffer.

The batch size of the request determines the number of records that will be returned in one batch upon request. Let's say if your query gives you 1000 records and your query size is 1000, all results will be returned in one response. If the query batch size is 100, it takes 10 batches to return the entire result set.



See http://www.aerospike.com/docs/operations/manage/queries/ for details .

Similarly, you can increase the request-buf size to increase the buffer size. A higher buffer size will result in fewer batches.

+1


source







All Articles