Importing modules for code that works for workers
You can submit jobs as batch operations with dependent modules using the command line interface spark-submit
. From Spark 1.6.1 documentation it has the following signature ...
./bin/spark-submit \
--class <main-class>
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
If your python script is called python_job.py
and the module it depends on other_module.py
, you call
./bin/spark-submit python_job.py --py-files other_module.py
This will ensure that other_module.py is on the worker nodes. More often than not, you send a full package to send other_module_library.egg
or even .zip
. All this should be acceptable in --py-files
.
If you want to work in an interactive shell, I believe you will need to import the module inside this function.