Can you load Apache Spark DataSets from named pipes?
I am currently using XUbuntu 16.04, Apache Spark 2.1.1, IntelliJ and Scala 2.11.8
I am trying to load some simple CSV text data into Apache Spark Dataset, but instead of a plain text file, I am collecting the data into a named pipe, and then I want to read that data directly into the DataSet. It works fine if the data is a regular file, but the exact same data doesn't work if it comes from a named pipe. My Scala code is very simple and looks like this:
import org.apache.spark.sql.SparkSession
object PipeTest {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName("PipeTest")
.master("local")
.getOrCreate()
// Read data in from a text file and input to a DataSet
var dataFromTxt = spark.read.csv("csvData.txt")
dataFromTxt.show()
// Read data in from a pipe and input to a DataSet
var dataFromPipe = spark.read.csv("csvData.pipe")
dataFromPipe.show()
}
}
The first section of code loads the csv data from a regular file and works fine. Section 2 of the code fails with the following error:
Exception in thread "main" java.io.IOException: File access error: /home/andersonlab/test/csvData.pipe
Does anyone know how you are going to use named pipes with Spark datasets and get something like the above to work?
source to share
No one has answered this question yet
Check out similar questions: