Spark multiple sessions versus shared global session

Question

Spark multiple sessions versus shared global session

Question

What is the motivation for creating multiple Spark apps / sessions instead of sharing a global session?

Description

You have a Spark Standalone Cluster Manager.

Cluster:

5 cars
2 cores (performers) each = 10 performers in total
RAM 16 GB each machine

Job:

The Dump database requires all (10) executors, but only 1GB of RAM for each executor.
To process the dump results, it requires 5 executors with 8-16 GB of RAM.
Fast data retrieval task, 5 performers with 1 GB of RAM.
etc.

What's the best practice solution? Why should I ever prioritize 1st solution over 2nd or 2nd through 1st when the cluster resource stays the same?

Solutions:

Run jobs 1, 2 and 3 from different Spark applications (JVM).
Use one global Spark application / session that stores all cluster resources (10 executors, each 8GB RAM). Create a fair scheduler pool for 1st, 2nd and 3rd jobs.
Use some hacks like this to run jobs with different configurations from the same JVM. But I'm afraid not very stable (officially supported by the Spark team if you want).
[Spark Job Server] [5, but as far as I understand this is the implementation of the first solution

Update

It looks like the second option (global session with all resources + thread pool) is not possible due to the fact that you can only configure the number of cores in the pool.xml ( minShare

) file , but cannot store memory per executor.

+3

java architecture configuration distributed-computing apache-spark

Volodymyr Bakhmatiuk May 11 '17 at 14:50

source to share

No one has answered this question yet

See similar questions:

6

SparkContext setLocalProperties

or similar:

1074

How do servlets work? Creation, sessions, shared variables and multithreading

192

What are workers, executors, cores in the Spark Standalone cluster?

170

Apache Spark: cores versus executors

7

why doesn't Spark distribute tasks to all performers, but only to one performer?

2

How to send multiple Spark applications in parallel without creating separate JVMs?

2

How do I wait for all artists to be highlighted before the Spark app starts with YARN?

1

Spark on Mesos - launch multiple streaming jobs

1

Running Spark on a heterogeneous cluster offline

0

Apache Spark: Using Hardware Resources

0

Spark: run from single JVM jobs with different memory / core configurations at the same time

Spark multiple sessions versus shared global session

Question

Description

Update

More articles: