Recursively traverse the LARGE directory using Scala 2.8 continuation

Is it possible to recursively traverse a directory using Scala continuations (introduced in 2.8)?

My directory contains millions of files, so I cannot use a Stream

because I will get inactive memory. I am trying to write a Actor

mailing list so that the participating operators process files in parallel.

Does anyone have an example?

+3


source to share


3 answers


If you want to stick with Java 1.6 (as opposed FileVistor

to 1.7), and you have subdirectories and not all of your millions of files in just one directory, you can

class DirectoryIterator(f: File) extends Iterator[File] {
  private[this] val fs = Option(f.listFiles).getOrElse(Array[File]())
  private[this] var i = -1
  private[this] var recurse: DirectoryIterator = null
  def hasNext = {
    if (recurse != null && recurse.hasNext) true
    else (i+1 < fs.length)
  }
  def next = {
    if (recurse != null && recurse.hasNext) recurse.next
    else if (i+1 >= fs.length) {
      throw new java.util.NoSuchElementException("next on empty file iterator")
    }
    else {
      i += 1;
      if (fs(i).isDirectory) recurse = new DirectoryIterator(fs(i))
      fs(i)
    }
  }
}

      



This requires that there are no loops on your filesystem. If it has loops, you need to keep track of the directories you hit in the set and avoid repeating them again. (If you don't even want to hit the files twice if they are related to each other from two different locations, you need to put everything in a set, and there isn't much point in using an iterator rather than just reading all the information about a file into memory.)

+3


source


This asks a question more than an answer.



If your process is I / O bound, parallel processing may not improve your throughput. In many cases, this will worsen the situation by causing the disk head to be bumped. Before doing much on this line, look at how busy the disk is. If it is already busy most of the time with one thread, at most one thread will be useful - and even that can be counterproductive.

+1


source


How about using it Iterator

?

0


source







All Articles