Failed to write instructions

Question

Failed to write instructions

I am using spark with cassandra and I want to write data to cassandra table:

CREATE TABLE IF NOT EXISTS MyTable(
 user TEXT,
 date TIMESTAMP,
 event TEXT,
 PRIMARY KEY((user ),date , event)
);

But I got this error:

java.io.IOException: Failed to write statements to KeySpace.MyTable.
    at    com.datastax.spark.connector.writer.TableWriter$$anonfun$write$1.apply(TableWriter.scala:145)
    at com.datastax.spark.connector.writer.TableWriter$$anonfun$write$1.apply(TableWriter.scala:120)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:100)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:99)
    at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:151)
    at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:99)
    at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:120)
    at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
    at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:56)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
    at akka.actor.ActorCell.invoke(ActorCell.scala:487)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
    at akka.dispatch.Mailbox.run(Mailbox.scala:220)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/04/28 17:57:47 WARN TaskSetManager: Lost task 13.2 in stage 1.0 (TID 43, dev2-cim.aid.fr): TaskKilled (killed intentionally)

and these warnings in my Cassandra log file:

WARN  [SharedPool-Worker-2] 2015-04-28 16:45:21,219 BatchStatement.java:243 - Batch of prepared statements for [*********] is of size 8158, exceeding specified threshold of 5120 by 3038

after doing some internet searches, I found this link which explains how it fixes the same issue: http://progexc.blogspot.fr/2015/03/write-batch-size-error-spark-cassandra.html

So now I modified my spark algorithm to add:

conf.set("spark.cassandra.output.batch.grouping.key", "None")
conf.set("spark.cassandra.output.batch.size.rows", "10")
conf.set("spark.cassandra.output.batch.size.bytes", "2048")

these values are removed warning message, which I received in magazines cassandra, but I still got the same error: Failed to write statements

.

In my spark log I came across this error:

Failed to execute: 
    com.datastax.spark.connector.writer.RichBatchStatement@67827d57
    com.datastax.driver.core.exceptions.InvalidQueryException: Key may not be    empty
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:103)
    at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140)
    at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:293)
    at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:455)
    at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:734)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.handler.timeout.IdleStateAwareChannelUpstreamHandler.handleUpstream(IdleStateAwareChannelUpstreamHandler.java:36)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.timeout.IdleStateHandler.messageReceived(IdleStateHandler.java:294)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at  org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)

+3

cassandra-2.0 apache-spark datastax

Amine CHERIFI Apr 28 15 at 16:09

source to share

4 answers

If you are working in thread cluster mode, be sure to test all the yarn of the log using yarn logs -applicationId <appId> --appOwner <appOwner>

. This gave me more reason to reject than the magazines on yarn websites

Caused by: com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_QUORUM (2 required but only 1 alive)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:50)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37)
at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:266)
at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:246)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
... 11 more

The solution is to set spark.cassandra.output.consistency.level=ANY

in your spark-defaults.conf

+3

user2006865 01 Aug 16 at 19:01

source to share

I solved the problem by restarting my cluster, just like the nodes. The following is what I have tried. I am also facing the same question. I tried all the options above than you mentioned in the blog, but didn't have time. The size of my data is 174 GB. Total 174 GB of data, My cluster has 3 nodes, each node with 16 cores and 48 GB of RAM. I tried to overclock 174gb with one shot, while I have the same problem. After that I split 174 gb in 109 files each 1.6 Gb and tried lode, this time I faced the same problem again after downloading 100 files (each 1.6 gb). I was thinking maybe there is a data problem in 101 files. I tried to load the first file and tried to move the first file to a new table and tried to merge the new data into a new table, but all these cases have a problem. Then I thinkthat this is a problem with the cassandra cluster and restarted the cluster and nodes. Then the problem went away.

0

Tarak 20 Mar 16 at 6:46

source to share

Add a breakpoint to "com / datastax / spark / connector / writer / AsyncExecutor.scala: 45", you might get a real exception.

In my case, the replication_factor of my keypace is 2, but I only have one live.

0

xiemeilong Jul 29. 16 at 10:18

source to share

user2158166 · Accepted Answer · 2016-04-20T13:04:03+0000

I had the same problem and found the solution in the comments above (from Amine CHERIFI and maasg).

The column corresponding to the primary key was not always filled with the correct value (in my case with an empty string "").

This caused an ERROR error

ERROR QueryExecutor: Failed to execute: \
com.datastax.spark.connector.writer.RichBatchStatement@26ad2668 \
com.datastax.driver.core.exceptions.InvalidQueryException: Key may not be empty

The solution was to provide a non-empty default string.

Failed to write instructions

More articles: