I have a spark master and a worker working in docker containers with spark 2.0.2 and hadoop 2.7. I am trying to submit a job from pyspark from another container (same network) by running

df ="/data/test.json")


But I am getting this error:

java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter;


It doesn't matter if I try interactively or with spark-submit. These are my downloaded packages to sparks:

com.databricks#spark-avro_2.11;3.2.0 from central in [default]
com.thoughtworks.paranamer#paranamer;2.7 from central in [default]
org.apache.avro#avro;1.8.1 from central in [default]
org.apache.commons#commons-compress;1.8.1 from central in [default]
org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
org.codehaus.jackson#jackson-mapper-asl;1.9.13 from central in [default]
org.slf4j#slf4j-api;1.7.7 from central in [default]
org.tukaani#xz;1.5 from central in [default]
org.xerial.snappy#snappy-java; from central in [default]


spark-submit --version


scala version - 2.11.8

My pyspark command:

PYSPARK_PYTHON=ipython /usr/spark-2.0.2/bin/pyspark --master spark://master:7077 --packages com.databricks:spark-avro_2.11:3.2.0,org.apache.avro:avro:1.8.1


My spark-submit command:

spark-submit --master spark://master:7077 --packages com.databricks:spark-avro_2.11:3.2.0,org.apache.avro:avro:1.8.1


I read here that it might be caused by "old version of avro in use", so I tried using 1.8.1, but I still get the same error. Reading avro works fine. Any help?


The reason for this error is that apache avro version 1.7.4 is included in hasoop by default, and if the SPARK_DIST_CLASSPATH

env variable includes general chaos ( $HADOOP_HOME/share/common/lib/

) in front of ivy2 jars, the wrong version may instead of the version required by spark-avro (> = 1.7. 6) and installed in ivy2.

To check if this is the case, open spark-shell

and run



This should indicate the location of the class like this: = jar:file:/lib/ivy/jars/org.apache.avro_avro-1.7.6.jar!/org/apache/avro/generic/GenericData.class


If this class points to $HADOOP_HOME/share/common/lib/

, then you should just include your ivy2 banks before the haop common in the SPARK_DIST_CLASSPATH

env variable .

For example, in the Dockerfile

ENV SPARK_DIST_CLASSPATH="/home/root/.ivy2/*:$HADOOP_HOME/etc/hadoop/*:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/hdfs/lib/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/yarn/lib/*:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/tools/lib/*"


Note. /home/root/.ivy2

is the default location for ivy2 banners, you can control this by setting spark.jars.ivy

to spark-defaults.conf

, which is probably a good idea.



I faced a similar problem before. Try using the option - jars {path to spark-avro_2.11-3.2.0.jar} in spark-submit



