Out of memory error in Cassandra when querying large strings containing a collection (set)

I am using Cassandra 2.0.8 and I have a cql3 table defined like this:

CREATE TABLE search_scf_tdr (
  fieldname text,
  fieldvalue text,
  scalability int,
  timestamptdr bigint,
  tdrkeys set<blob>,
  PRIMARY KEY ((fieldname, fieldvalue, scalability), timestamptdr)
)

      

I am using a DC replication factor of 2 for this keyspace. I am inserting into this table by adding items to the tdrkeys collection one by one using the following update:

UPDATE search_scf_tdr SET tdrkeys = tdrkeys + "new value" WHERE "all primary key fields";

      

Each element in tdrkeys

is equal to 84 bytes (fixed size).

When querying on this table, I am getting about 160 rows at the same time as my query (using ranges in timestamptdr

and scalability

and a fixed value for fieldname

and fieldvalue

). Strings contain several thousand items in the collection tdrkeys

.

I have a cluster of 42 nodes separated in two datacenters. I have separate servers using the datastax java 2.0.9.2 driver running 24 threads in each datacenter calling this request (doing a lot of other things with the result between each request) with consistency level ONE:

SELECT tdrkeys FROM search_scf_tdr WHERE fieldname='timestamp' and fieldvalue='' and scalability IN (0,1,2,3,4,5,6,7,8,9,10) and timestamptdr >= begin and timestamptdr < end;

      

Each Cassandra node has 8 GB of Java heap and 16 GB of physical memory. We have tweaked as many cassandra.yaml and JVM options as possible, but still getting out of memory issues.

The heap heaps we get from memory errors show more than 6 GB of heap taken by threads (between 200 and 300) containing many instances of org.apache.cassandra.io.sstable.IndexHelper $ IndexInfo containing 2 HeapByteBuffer contains 84 bytes of data ...

Cassandra system.log shows errors like this:

ERROR [Thread-388] 2015-05-18 12:11:10,147 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-388,5,main]
java.lang.OutOfMemoryError: Java heap space
ERROR [ReadStage:321] 2015-05-18 12:11:10,147 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:321,5,main]
java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
    at org.apache.cassandra.io.util.MappedFileDataInput.readBytes(MappedFileDataInput.java:146)
    at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
    at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
    at org.apache.cassandra.io.sstable.IndexHelper$IndexInfo.deserialize(IndexHelper.java:187)
    at org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:122)
    at org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:970)
    at org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:871)
    at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:41)
    at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:167)
    at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62)
    at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:250)
    at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
    at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
    at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
    at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:327)
    at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
    at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47)
    at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

      

+3


source to share


1 answer


You are using an "IN" query for multiple partitions because scalability is part of the partition key. This forces cassandra to coordinate the request across multiple nodes. For more information see for example this .



The solution is to run a separate query for each scalability value and then combine the result manually or not make it part of the partition key, i.e. PRIMARY KEY ((fieldname, fieldvalue), scalability, timestamptdr)

if it is possible.

+3


source







All Articles