What does "RDDs can be stored in memory" mean in Spark?

Spark's introduction says:

RDDs can be stored in memory between requests without the need for replication .

As I know you have to cache the RDD manually using .cache()

or .persist()

. If I don't take action like below

   val file = sc.textFile("hdfs://data/kv1.txt")
   file.flatMap(line => line.split(" "))
   file.count()

      

I am not storing the RDD "file" in the cache or disk, is Spark faster than MapReduce in this state?

+3


source to share


2 answers


What happens is that Spark will compute, partition into sections, every step of the computation. It will temporarily store some data in memory to do its job. You may have to flip through the data to disk and transfer over the network to complete several steps. But none of this is (necessarily) persistent. If you do it count()

again, it will start from scratch.

This is not a case where Spark will be faster than MapReduce; it will probably be slower for a simple operation like this. In fact, there is nothing to benefit from loading into memory.



More complex examples, such as with a non-trivial pipeline or RDD re-access, will take advantage of being stored in memory or even on disk.

+4


source


Yes tonyking, it will run faster than MapReduce no doubt about it. A spark that treats all RDDs like in memory, each transformed RDD can be recalculated every time you run an action on it. However, you can also persist the RDD in memory using the persist (or cache) method, in which case Spark will keep the items around the cluster for faster access on the next request. There is also support for persistent RDDs on disk, or multi-site replication.

http://spark.apache.org/docs/latest/programming-guide.html



"This is very useful when re-accessing data, such as when querying a small hot dataset or when running an iterative algorithm such as PageRank."

The answer to your question is, "What does it mean" RDDs can be stored in memory "in Spark?" we can STORE one RDD in RAM using .cache () without recalculating (as long as we apply an action to it).

+2


source







All Articles