Does randomSplit return a copy or reference to the original rdd?
Suppose I have something like the code below
for idx in xrange(0, 10): train_test_split = training.randomSplit(weights=[0.75, 0.25]) train_cv = train_test_split test_cv = train_test_split # scale train_cv and test_cv
, will the original data be affected?
source to share