How can I delete rows in a table created from the Spark framework?
Basically, I would like to do a simple delete using SQL statements, but when I execute the sql script, it throws the following error:
pyspark.sql.utils.ParseException: u "\ nmissing 'FROM' at 'a' (line 2, pos 23) \ n \ n == SQL == \ n \ n DELETE a. * FROM adsquare a \ P - --------------------- ^^^ \ n "
This is the script I am using:
sq = SparkSession.builder.config('spark.rpc.message.maxSize','1536').config("spark.sql.shuffle.partitions",str(shuffle_value)).getOrCreate()
adsquare = sq.read.csv(f, schema=adsquareSchemaDevice , sep=";", header=True)
adsquare_grid = adsqaureJoined.select("userid", "latitude", "longitude").repartition(1000).cache()
adsquare_grid.createOrReplaceTempView("adsquare")
sql = """
DELETE a.* FROM adsquare a
INNER JOIN codepoint c ON a.grid_id = c.grid_explode
WHERE dis2 > 1 """
sq.sql(sql)
Note : The code point table is created at run time.
Is there any other way to delete lines with the above conditions?
+3
source to share
3 answers
You cannot delete rows from Data Frame. But you can create a new Data Frame that excludes unwanted entries.
sql = """
Select a.* FROM adsquare a
INNER JOIN codepoint c ON a.grid_id = c.grid_explode
WHERE dis2 <= 1 """
sq.sql(sql)
This way you can create a new data frame. Here I used the opposite conditiondis2 <= 1
+4
source to share