Parse files the fastest way?
I am writing in a graph library that needs to read the most common graphics formats. One format contains the following information:
e 4 3
e 2 2
e 6 2
e 3 2
e 1 2
....
and I want to parse these lines. I looked at stackoverflow and found a neat solution for this. I am currently using an approach like this (the file is a fstream file):
string line;
while(getline(file, line)) {
if(!line.length()) continue; //skip empty lines
stringstream parseline = stringstream(line);
char identifier;
parseline >> identifier; //Lese das erste zeichen
if(identifier == 'e') {
int n, m;
parseline >> n;
parseline >> m;
foo(n,m) //Here i handle the input
}
}
It works pretty well for its intended purpose, but today when I tested it with huge graph files (50mb +) I was shocked that this feature was the worst bottleneck in the entire program:
The string stream that I use to parse a string uses almost 70% of the total execution time, while the getline command uses 25%. The rest of the program uses only 5%.
Is there a quick way to read these large files, perhaps avoiding slow lines and getline functions?
source to share
You can skip double buffering your string, skip parsing a single character, and use strtoll
to parse integers like:
string line;
while(getline(file, line)) {
if(!line.length()) continue; //skip empty lines
if (line[0] == 'e') {
char *ptr;
int n = strtoll(line.c_str()+2, &ptr, 10);
int m = strtoll(ptr+1, &ptr, 10);
foo(n,m) //Here i handle the input
}
}
In C ++ strtoll
should be in <cstdlib>
include file .
source to share