Is it possible to read an ORC file into a Spark Data Frame in sparklyr?

I know sparklyr has the following file reading methods:

  • spark_read_csv

  • spark_read_parquet

  • spark_read_json

How about reading orc files? Is this library still supported?

I know I can use read.orc in SparkR or this solution , but I would like to keep the code in sparklyr.

+3


source to share


1 answer


You can use the low-level Spark API in the same way as I described in my answer to Transferring data from a database to Spark using sparklyr :

library(dplyr)
library(sparklyr)

sc <- spark_connect(...)

spark_session(sc) %>% 
  invoke("read") %>% 
  invoke("format", "orc") %>%
  invoke("load", path) %>% 
  invoke("createOrReplaceTempView", name)

df <- tbl(sc, name)

      

where name

is an arbitrary name used to identify the table



In the current version, sparklyr

you can replace the above with spark_read_source

:

spark_read_source(sc, name, source = "orc", options = list(path = path))

      

+5


source







All Articles