Is it possible to efficiently skip a string of unknown size in Java?
When reading very large files (GB scale) in Java, I know exactly which lines I need to process. But I don't know the size of each line, and they might be different.
My question is the following:
Do you have an efficient approach to skip useless lines? My (naive) approach is to read the string, not process it, but that sounds like a waste of time and memory space.
The code I'm looking for might look like this:
SortedMap goodLineNumbers = ......
Int currentLineNumber = 1;
try(BufferedReader br = new BufferedReader(new FileReader(tracefile))) {
do{
if(goodLineNumbers.containsKey(currentLineNumber)) {
line = br.readLine();
// process line
} else {
line = EfficientSkip(br); // don't know the size of the line
}
currentLineNumber++;
}
while(line != null);
} catch (IOException e) {
e.printStackTrace();
}
source to share
If you have ownership of the file format, you can add the length of each line before writing it, a kind of header. This will allow you to go from line to line, brushing it out to the end. For this task you can use RandomAccessFile instead of BufferedReader.
readLong () - length of the read line
readLine () - if a line is required
skipBytes (int n) - otherwise
source to share
Try using LineNumberReader instead. You can get / set the current line to read. This way you can just access and read the lines you want. Period.Strike>
Thanks to Dima for pointing out that LineNumberReader also cannot access by line number.
Thinking more about the problem, it is theoretically impossible to determine at what point in the file a particular line starts if only one: A) has prior knowledge of the (combined) length of the previous lines, or B) reads the entire file up to the specified point (with or without content processing).
source to share