Streaming HDFS data to Storm (also HDFS media)

I would like to know if there is any spout implementation for streaming data from HDFS to Storm (something similar to Spark Streaming from HDFS). I know there is a shutter implementation for writing data to HDFS ( and /content/ch_storm-using-hdfs-connector.html ), but I couldn't find another way. I appreciate any suggestions and hints.


source to share

1 answer

The option is to use the Hadoop HDFS java API. Assuming you are using maven, you would include hasoop-common in your pom.xml:



Then, in your spout implementation, you must use the HDFS FileSystem object. For example, here's some pseudo code to emit each line in a file as a string:

public void nextTuple() {
   try {
      Path pt=new Path("hdfs://servername:8020/user/hdfs/file.txt");
      FileSystem fs = FileSystem.get(new Configuration());
      BufferedReader br = new BufferedReader(new InputStreamReader(;
      String line = br.readLine();
      while (line != null){
         // emit the line which was read from the HDFS file
         // _collector is a private member variable of type SpoutOutputCollector set in the open method;
         _collector.emit(new Values(line));
   } catch (Exception e) {
      LOG.error("HDFS spout error {}", e);




All Articles