Can you load Apache Spark DataSets from named pipes?

I am currently using XUbuntu 16.04, Apache Spark 2.1.1, IntelliJ and Scala 2.11.8

I am trying to load some simple CSV text data into Apache Spark Dataset, but instead of a plain text file, I am collecting the data into a named pipe, and then I want to read that data directly into the DataSet. It works fine if the data is a regular file, but the exact same data doesn't work if it comes from a named pipe. My Scala code is very simple and looks like this:

import org.apache.spark.sql.SparkSession

object PipeTest {

  def main(args: Array[String]): Unit = {

  val spark = SparkSession
    .builder()
    .appName("PipeTest")
    .master("local")
    .getOrCreate()

     // Read data in from a text file and input to a DataSet    
    var dataFromTxt = spark.read.csv("csvData.txt")
    dataFromTxt.show()


     // Read data in from a pipe and input to a DataSet
    var dataFromPipe = spark.read.csv("csvData.pipe")
    dataFromPipe.show()
  }
}

      

The first section of code loads the csv data from a regular file and works fine. Section 2 of the code fails with the following error:

Exception in thread "main" java.io.IOException: File access error: /home/andersonlab/test/csvData.pipe

Does anyone know how you are going to use named pipes with Spark datasets and get something like the above to work?

+3


source to share





All Articles