Does randomSplit return a copy or reference to the original rdd?

Suppose I have something like the code below

for idx in xrange(0, 10):
    train_test_split = training.randomSplit(weights=[0.75, 0.25])
    train_cv = train_test_split[0]
    test_cv = train_test_split[1]
    # scale train_cv and test_cv


by scaling train_cv

and test_cv

, will the original data be affected?


source to share

1 answer

RDDs are immutable.

Hence, it is not really possible to "change" the RDDs just by converting them. So no, the original data will not be affected.



All Articles