Fastest way to find a line containing a specific word in a large text file

I am trying to find a string containing specific text inside a large text file ( 18MB ), currently I am using StreamReader to open the file and read it line by line check if it contains the search string

while ((line = reader.ReadLine()) != null)
{
    if (line.Contains("search string"))
    {
        //Do something with line
    }
}

      

But unfortunately, since the file I'm using has over 1 million records, this method is slow. What's the fastest way to achieve this?

+3


source to share


2 answers


In general, this kind of IO disk will just be slow. There probably isn't much that can be done to improve your current version in terms of performance, at least not without drastically changing the storage format of your data or your hardware.

However, you can shorten the code and simplify it in terms of maintenance and readability:

var lines = File.ReadLines(filename).Where(l => l.Contains("search string"));
foreach(var line in lines)
{
    // Do something here with line
}

      




Reading the whole file into memory makes the application freeze and is very slow, do you think there are other alternatives

If the main goal here is to prevent apps from freezing, then you can do this in the background rather than on the UI thread. If you create your async method it can become:

while ((line = await reader.ReadLineAsync()) != null)
{
    if (line.Contains("search string"))
    {
        //Do something with line
    }
}

      

This will most likely cause the overall operation to take longer, but not block your UI thread when accessing the file.

+3


source


  • Get a hard drive with a faster read speed (switch to an SSD if you couldn't already).

  • Store data across multiple files on different physical disks. View these discs in parallel.

  • Use a RAID0 hard drive configuration. (This is kind of a special case of the previous approach.)

  • Create an index of the lines in the file that you can use to search for specific words. (Creating an index will be much more expensive than a single search and will require a lot of disk space, but this will allow subsequent searches to be performed at much faster speeds.)



+1


source







All Articles