How can I delete rows in a table created from the Spark framework?

Question

How can I delete rows in a table created from the Spark framework?

Basically, I would like to do a simple delete using SQL statements, but when I execute the sql script, it throws the following error:

pyspark.sql.utils.ParseException: u "\ nmissing 'FROM' at 'a' (line 2, pos 23) \ n \ n == SQL == \ n \ n DELETE a. * FROM adsquare a \ P - --------------------- ^^^ \ n "

This is the script I am using:

sq = SparkSession.builder.config('spark.rpc.message.maxSize','1536').config("spark.sql.shuffle.partitions",str(shuffle_value)).getOrCreate()
adsquare = sq.read.csv(f, schema=adsquareSchemaDevice , sep=";", header=True)
adsquare_grid = adsqaureJoined.select("userid", "latitude", "longitude").repartition(1000).cache()
adsquare_grid.createOrReplaceTempView("adsquare")   

sql = """
    DELETE a.* FROM adsquare a
    INNER JOIN codepoint c ON a.grid_id = c.grid_explode
    WHERE dis2 > 1 """

sq.sql(sql)

Note : The code point table is created at run time.

Is there any other way to delete lines with the above conditions?

+3

apache-spark pyspark apache-spark-sql

ebertbm Apr 20 '17 at 9:14

source to share

3 answers

File frames in Apache Spark are immutable. SO, you cannot change it to remove rows from the dataframe, you can filter out the row you don't want and store in another frame.

+4

Shankar koirala Apr 20 17 at 10:10

source to share

You cannot delete rows from Data Frame because Hadoop follows WORM (write times many times) , instead you can filter out deleted records in SQL statement, will give you a new data frame.

0

Souvik Apr 20 '17 at 9:26

source to share

Manish Saraf Bhardwaj · Accepted Answer · 2017-04-20T10:36:33+0000

You cannot delete rows from Data Frame. But you can create a new Data Frame that excludes unwanted entries.

sql = """
    Select a.* FROM adsquare a
    INNER JOIN codepoint c ON a.grid_id = c.grid_explode
    WHERE dis2 <= 1 """

sq.sql(sql)

This way you can create a new data frame. Here I used the opposite conditiondis2 <= 1

How can I delete rows in a table created from the Spark framework?

More articles: