Org.apache.spark.sql.AnalysisException: Unable to resolve given input columns

Question

Org.apache.spark.sql.AnalysisException: Unable to resolve given input columns

exitTotalDF
  .filter($"accid" === "dc215673-ef22-4d59-0998-455b82000015")
  .groupBy("exiturl")
  .agg(first("accid"), first("segment"), $"exiturl", sum("session"), sum("sessionfirst"), first("date"))
  .orderBy(desc("session"))
  .take(500)

org.apache.spark.sql.AnalysisException: cannot resolve '`session`' given input columns: [first(accid, false), first(date, false),  sum(session), exiturl, sum(sessionfirst), first(segment, false)]

Its like the sum function cannot find the column names correctly.

Using Spark 2.1

+4

scala dataframe apache-spark apache-spark-sql spark-jobserver

ozzieisaacs 09 May '17 at 16:29

source to share

3 answers

I prefer to use withColumnRenamed()

instead as()

because:

With as()

you need to specify all the columns it needs:

    df.select(first("accid"), 
          first("segment"),
          $"exiturl", 
          col('sum("session")').as("session"),
          sum("sessionfirst"),
          first("date"))

VS withColumnRenamed

- one liner:

    df1 = df.withColumnRenamed('sum("session")', "session")

The output df1

will contain all the columns that df has, except that the sum ("session") column has now been renamed to "session"

+3

Sruthi poddutur 19 jan. 18 at 11:03

source to share

From spark2.0 spark-shell is enabled from hive by default. We can disable hive support using the command below.

spark-shell --conf spark.sql.catalogImplementation=in-memory

0

Balkrushna Patil Jul 17 At 15:54

source to share

Derek_M · Accepted Answer · 2017-05-09T16:52:35+0000

Typically, in scenarios like this, I use the as

on-column method . For example .agg(first("accid"), first("segment"), $"exiturl", sum("session").as("session"), sum("sessionfirst"), first("date"))

. This gives you more control over what to expect, and if the summation name were to ever change in future versions of spark, you have less headache updating all the names in your dataset.

Also, I just ran a simple test. When you don't provide a name, it looks like this: the name in Spark 2.1 changes to "sum (session)". One way to find this is to call printSchema on the dataset.

Org.apache.spark.sql.AnalysisException: Unable to resolve given input columns

More articles: