Add Spark to the general Oozie list

By default, the Oozie shared lib directory provides libraries for Hive, Pig and Map-Reduce. If I want to run a Spark job on Oozie, it might be better to add the Spark lib jars to the Oozie shared lib rather than copy them into the application lib directory.
How can I add Spark lib jars (including spark core and its dependencies) to Oozie shared lib? Any comments / responses are appreciated.

+3


source to share


1 answer


The Spark action is slated to be released with Oozie 4.2.0, although the doc seems to be a little behind. See the relevant JIRA here: Oozie JIRA - Add Spark Action Executor

Cloudera CDH 5.4 is already released, see white paper: CDH 5.4 oozie doc - Oozie Spark Action Extension

With an older version of Oozie, banks can share different approaches. The first approach might work best. All lists anyway:



Below are the different ways to incorporate a banner into your workflow:

Set oozie.libpath = / path / to / jars, another / path / to / jars in job.properties.

This is useful if you have many workflows that all need the same jars; you can put it in one place in HDFS and use it with many workflows. Banks will be available for all activities in this workflow. There is no need to ever put this in the ShareLib location. (I see this in a lot of workflows.) Oozie knows where the ShareLib is and will automatically include it if you set oozie.use.system.libpath = true in job.properties.

Create a directory named "lib" next to your workflow.xml document in HDFS and place the jars there.

This is useful if you have multiple cans that you only need for one workflow. Oozie will automatically make these banks available for all activities in this workflow.

Specify the tag in action with the path to one jar; you can have multiple tags.

This is useful if you only want to use multiple jars for a specific activity and not for all activities in a workflow. The downside is that you have to specify them in your workflow.xml, so if you ever need to add / remove multiple jars, you need to change your workflow.xml.

Add jars to ShareLib (e.g. / user / oozie / share / lib / lib_ / pig)

While this will work, it is not recommended for two reasons: Additional jars will be included in each workflow using this ShareLib, which may be unexpected for these workflows and users. When you update ShareLib, you will have to add additional jars to the new ShareLib again.

Quote from Robert Kanter's blog here: How-to: Use ShareLib in Apache Oozie (CDH 5)

+2


source







All Articles