Spark multiple sessions versus shared global session

Question

What is the motivation for creating multiple Spark apps / sessions instead of sharing a global session?

Description

You have a Spark Standalone Cluster Manager.

Cluster:

  • 5 cars
  • 2 cores (performers) each = 10 performers in total
  • RAM 16 GB each machine

Job:

  • The Dump database requires all (10) executors, but only 1GB of RAM for each executor.
  • To process the dump results, it requires 5 executors with 8-16 GB of RAM.
  • Fast data retrieval task, 5 performers with 1 GB of RAM.
  • etc.

What's the best practice solution? Why should I ever prioritize 1st solution over 2nd or 2nd through 1st when the cluster resource stays the same?

Solutions:

  • Run jobs 1, 2 and 3 from different Spark applications (JVM).
  • Use one global Spark application / session that stores all cluster resources (10 executors, each 8GB RAM). Create a fair scheduler pool for 1st, 2nd and 3rd jobs.
  • Use some hacks like this to run jobs with different configurations from the same JVM. But I'm afraid not very stable (officially supported by the Spark team if you want).
  • [Spark Job Server] [5, but as far as I understand this is the implementation of the first solution

Update

It looks like the second option (global session with all resources + thread pool) is not possible due to the fact that you can only configure the number of cores in the pool.xml ( minShare

) file , but cannot store memory per executor.

+3
java architecture configuration distributed-computing apache-spark


source to share


No one has answered this question yet

See similar questions:

6
SparkContext setLocalProperties

or similar:

1074
How do servlets work? Creation, sessions, shared variables and multithreading
192
What are workers, executors, cores in the Spark Standalone cluster?
170
Apache Spark: cores versus executors
7
why doesn't Spark distribute tasks to all performers, but only to one performer?
2
How to send multiple Spark applications in parallel without creating separate JVMs?
2
How do I wait for all artists to be highlighted before the Spark app starts with YARN?
1
Spark on Mesos - launch multiple streaming jobs
1
Running Spark on a heterogeneous cluster offline
0
Apache Spark: Using Hardware Resources
0
Spark: run from single JVM jobs with different memory / core configurations at the same time



All Articles
Loading...
X
Show
Funny
Dev
Pics