Pyspark reads caffe models from HDFS
I am using the caffe library to detect images using the PySpark framework. I can run the spark program in local mode where the model is present on the local filesystem.
But when I want to deploy it in cluster mode, I don't know how to do it correctly. I've tried the following approach:
-
Add files to HDFS and use
addfile
or--file
when submitting jobssc.addFile("hdfs:///caffe-public/dataset/test.caffemodel")
-
Reading the model in each working node with
model_weight =SparkFiles.get('test.caffemodel') net = caffe.Net(model_define, model_weight, caffe.TEST)
Since it SparkFiles.get()
will return the local file location in the working node (not HDFS) so that I can restore my model using the path it returns. This approach also works fine in local mode, however in distributed mode it will result in the following error:
ERROR server.TransportRequestHandler: Error sending result StreamResponse{streamId=/files/xxx, byteCount=xxx, body=FileSegmentManagedBuffer{file=xxx, offset=0,length=xxxx}} to /192.168.100.40:37690; closing connection
io.netty.handler.codec.EncoderException: java.lang.NoSuchMethodError: io.netty.channel.DefaultFileRegion.<init>(Ljava/io/File;JJ)V
It looks like the data is too big to be shuffled, as discussed in Apache Spark: Networking Errors Between Executors However, the model size is only about 1M.
Updated:
I found that if the path in sc.addFile(path)
is on HDFS, no error will appear. However, when the path is on the local filesystem, an error will appear.
My questions
-
Is there any other possibility that will throw the above exception? than the file size. (Spark is powered by YARN and I am using the default shuffle service, not the external shuffle service)
-
If I don't add the file on upload, how can I read the model file from HDFS using PySpark? (So ββthat I can restore the model using the caffe API). Or is there a way to get a different path from
SparkFiles.get()
?
Any suggestions would be appreciated!
source to share
No one has answered this question yet
See similar questions:
or similar: