Is it possible to efficiently skip a string of unknown size in Java?

When reading very large files (GB scale) in Java, I know exactly which lines I need to process. But I don't know the size of each line, and they might be different.

My question is the following:

Do you have an efficient approach to skip useless lines? My (naive) approach is to read the string, not process it, but that sounds like a waste of time and memory space.

The code I'm looking for might look like this:

SortedMap goodLineNumbers = ......

Int  currentLineNumber = 1;

try(BufferedReader br = new BufferedReader(new FileReader(tracefile))) {

    do{
         if(goodLineNumbers.containsKey(currentLineNumber)) {
               line = br.readLine();    
               // process line
         } else  {
              line = EfficientSkip(br); // don't know the size of the line
         }
         currentLineNumber++;
    }
    while(line != null);
} catch (IOException e) {           
    e.printStackTrace();
}

      

+3


source to share


4 answers


If you have ownership of the file format, you can add the length of each line before writing it, a kind of header. This will allow you to go from line to line, brushing it out to the end. For this task you can use RandomAccessFile instead of BufferedReader.

readLong () - length of the read line



readLine () - if a line is required

skipBytes (int n) - otherwise

+1


source


If you don't want BufferedReader to create strings for strings you don't need, read the char input, count the lines with EOL, and use BufferedReader.readLilne () when you are at the beginning of the line you want. I'm not sure if this will improve overall performance.



+3


source


Try using LineNumberReader instead. You can get / set the current line to read. This way you can just access and read the lines you want. Period.Strike>

Thanks to Dima for pointing out that LineNumberReader also cannot access by line number.

Thinking more about the problem, it is theoretically impossible to determine at what point in the file a particular line starts if only one: A) has prior knowledge of the (combined) length of the previous lines, or B) reads the entire file up to the specified point (with or without content processing).

+1


source


There is no magic. To find out how many lines you've read, you have to read them one at a time and count them. You don't need to store useless lines ( while (count++ < nextGoodNumber && reader.readLine() != null);

will do), but you do need to read them one at a time.

+1


source







All Articles