Is it possible to read an ORC file into a Spark Data Frame in sparklyr?
I know sparklyr has the following file reading methods:
-
spark_read_csv
-
spark_read_parquet
-
spark_read_json
How about reading orc files? Is this library still supported?
I know I can use read.orc in SparkR or this solution , but I would like to keep the code in sparklyr.
+3
source to share
1 answer
You can use the low-level Spark API in the same way as I described in my answer to Transferring data from a database to Spark using sparklyr :
library(dplyr)
library(sparklyr)
sc <- spark_connect(...)
spark_session(sc) %>%
invoke("read") %>%
invoke("format", "orc") %>%
invoke("load", path) %>%
invoke("createOrReplaceTempView", name)
df <- tbl(sc, name)
where name
is an arbitrary name used to identify the table
In the current version, sparklyr
you can replace the above with spark_read_source
:
spark_read_source(sc, name, source = "orc", options = list(path = path))
+5
source to share