Does randomSplit return a copy or reference to the original rdd?

Suppose I have something like the code below

for idx in xrange(0, 10):
    train_test_split = training.randomSplit(weights=[0.75, 0.25])
    train_cv = train_test_split[0]
    test_cv = train_test_split[1]
    # scale train_cv and test_cv

      

by scaling train_cv

and test_cv

, will the original data be affected?

+3


source to share


1 answer


RDDs are immutable.



Hence, it is not really possible to "change" the RDDs just by converting them. So no, the original data will not be affected.

+4


source







All Articles