Performance issues with Dassastax cassandra

I installed datastax Cassandra on 2 independent computers (one with 16GB RAM and the other with 32GB RAM) and went with most of the default configuration.

I created a table with about 700 columns when I try to insert records using java capable of inserting 1000 records in 30 seconds, which is less for me in my opinion, since it should be around 18000+ in the datastax test. For my surprise, the performance is the same on both 32GB and 16GB operating systems.

I am new to Cassandra, can someone help me in this regard. I feel like I am doing something wrong with the Cassandra.yaml configs.

+2


source to share


2 answers


I did some benchmarking and tuning on Cassandra some time ago. Found some useful settings which are listed below,

  • In Cassandra, data partitioning is based on strategies. The default is a combination of round robin and token policy that works best in all cases. If you want to customize data distribution, then you can write a new data distribution strategy in Cassandra, i.e. distribute data based on location based on attribute, etc., which can be best customized on demand.

  • Cassandra uses Bloom filters to determine if an SSTable has data for a specific row. We used a bloom filter value of 0.1 to maintain a balance between efficiency and overhead

  • Consistency level is a key parameter in NoSQL databases. Try with Quorum or one.

  • Other JVM tuning parameters, such as heap memory size, survival rate, should be optimal for maximum performance.

  • If more memory is available, the size of the memTable can be increased and this can fit into memory and this will improve performance. Flushing memtables per disk interval should be high enough so that it does not perform unnecessary I / O operations

  • Concurrency settings in Cassandra are important for scaling. Based on our tests and observations, we found that Cassandra performs better when concurrency is set to no. from cores * 5 and native_transport_max_threads up to 256

  • Follow the advanced settings recommended by Cassandra; disable swap, ulimit settings and compaction settings

  • The replication factor in Cassandra must be no. nodes in the cluster to achieve maximum system throughput.



This is mostly for pasting with little read impact. Hope this helps you :)

+6


source


Are you using async records?

Try running cassandra-stress so you can isolate client issues.

Another option is Brian's cassandra-loader:



https://github.com/brianmhess/cassandra-loader

Since you are writing in Java, use Brian's code as a best practice example.

+1


source







All Articles