How to keep the same order of schema columns with Spark Dataset map?

Question

How to keep the same order of schema columns with Spark Dataset map?

I am reading data from a Hive table and then trying to enrich it with an additional column that I got from other columns. But I'm having trouble with Spark changing my schema and ordering all columns by name.

After calling withColumn () and coding it with my enriched class, the schema is correct, but whenever I call map (), the schema and column order changes are wrong. How can I tell Spark to keep the original column order?

session.table("myTable")
    .as(Encoders.bean(Base.class))
    .withColumn("enrichedColumn", lit(""))
    .as(Encoders.bean(Enriched.class))
    .map(enriched -> enriched.enrich(), Encoders.bean(Enriched.class))
    .printSchema();

+3

java dataset apache-spark

Henrique castro Apr 13 17 at 7:15 am

source to share

No one has answered this question yet

Check out similar questions:

3073

How to efficiently iterate over each entry in a Java map?

1070

How can I initialize a static map?

324

Java class that implements map and preserves insert order?

25

Create new column with function in Spark Dataframe

3

How to check the content of a Spark Dataframe

2

Factorize the spark column

1

Spark createDataframe from RDD objects, column order

0

Add derived column (as array of structure) based on values and ordering of other columns in Spark Scala dataframe

0

Is there a good (immutable) way to pre-define a column for an RDD, or remove a column from an RDD?

0

How to move selected columns of DataFrame to the end (rearranging column positions)?

How to keep the same order of schema columns with Spark Dataset map?

More articles: