Cassandra batch query performance for tables with different partition keys

I have a test case where I get 150K requests per second from a client.

My test case requires insertion UNLOGGED batch

into multiple tables and with different section keys

BEGIN UNLOGGED  BATCH
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='Country' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('US')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='City' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Dallas')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='State' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Texas')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='SSN' and ptype='text' and date='2017-03-20' and pvalue=decimalAsBlob(000000000);
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='Gender' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Female')
APPLY BATCH

      

Is there a better way than the current way I am following?

because I am currently inserting batch tables into multiple tables that may be present in different clusters as they have different partition key, and as far as I know, inserting batch queries into different tables with different partition key has an additional tradeoff.

0


source to share


1 answer


First, it is important to know the example of using the batch.

Lots are often misused in an attempt to optimize performance.

Bats are used to maintain data consistency across multiple tables. If atomicity is required, the registered batch is used. If in your case it is a counter table, and if the count among the tables is not necessarily consistent, then don't use the package. If you cluster well, Cassandra makes sure all writes are successful.

Raw batches require the coordinator to handle inserts, which can put a lot of stress on the coordinator node. If other nodes have their own partition keys, the coordinator node has to deal with the network host, resulting in inefficient delivery. When generating updates for the same partition key, use unencrypted batches.



Please follow below:

https://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html

https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.npmx2cnsq

+2


source







All Articles