Spark ImportError: No module named *

Question

Spark ImportError: No module named *

why when i use rdd = rdd.map(lambda i: Recommendation.TransfertRecurrentConfigKey(i))

i got these errors

    org.apache.spark.api.python.PythonException: Traceback (most recent call last):File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/worker.py", line 161, in main
    func, profiler, deserializer, serializer = read_command(pickleSer, infile)
  File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/worker.py", line 54, in read_command
    command = serializer._read_with_length(file)
  File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length
    return self.loads(obj)
  File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/serializers.py", line 422, in loads
    return pickle.loads(obj)
ImportError: No module named Recommendation.Recommendation

and when i use this rdd = rdd.map( Recommendation.TransfertRecurrentConfigKey)

one does the code work well? if I ask that it is because I want to be able to pass the list of keys in the argument ["Sexe", "Age", "Profession", "Revenus"]

.

class Recommendation:
@staticmethod
def TransfertRecurrentRecommendation(dataFrame):
    rdd = dataFrame.rdd.map(Recommendation.TransfertRecurrentCleanData)
    rdd = rdd.filter(lambda x: x is not None)

    #rdd = rdd.map(lambda user: ((user['Sexe'], user['Age'], user['Profession'], user['Revenus']), user['TransfertRecurrent']))
    rdd = rdd.map(lambda i: Recommendation.TransfertRecurrentConfigKey(i))
    print rdd.collect()

@staticmethod
def TransfertRecurrentConfigKey(user):
    tmp = []
    for k in ["Sexe", "Age", "Profession", "Revenus"]:
        tmp.append(user[k])
    return tuple(tmp), user['TransfertRecurrent']

EDIT: I've solved the errors, but I still don't understand why it works in the first case and it doesn't work in the second. (See second answer ImportError: No module named numpy for spark workers )

+3

python apache-spark

vaaty Apr 24 17 at 10:06

source to share

No one has answered this question yet

See similar questions:

12

ImportError: no module named numpy for spark workers

or similar:

1531

Calling a function of a module using its name (string)

484

ImportError: No module with requested requests

2

PySpark RuntimeError: setting resized while iterating

1

pySpark take () returns error: TypeError: object 'int' is not iterable

1

How to join different datasets using pyspark and then call a custom function that takes a pandas datafile to convert to xml file

1

How to remove entries with zero values for product price in pyspark

1

When I set local to greater than 1 why pyspark resorts to error

0

Unable to access local file in pyspark

0

spark-submit - cannot sort the class in the package, but can select the same class in the root folder

0

pyspark using sklearn.DBSCAN getting error after submitting spark job locally

Spark ImportError: No module named *

More articles: