Efficient buffer search algorithm for any string from the list

I'm looking for an efficient search algorithm that, for a given set of rows, looks for a large buffer for any match from a set of rows. I am currently aware of several efficient one-line algorithms (I used to use Knuth before), but I don't know if they really help.

Here's what I actually do:

  • I have about 6-10 predefined lines, each about 200-300 characters (actually bytes, since I am processing binary data)
  • The input is a large, sometimes a little megabyte buffer
  • I would like to process the buffer and when I have a match, I would like to stop the search

I've searched for multi-line search algorithms using a finite set of predefined patterns, but they all seem to revolve around matching all the predefined strings in the buffer.

See this post: A quick substring search algorithm in a string , suggested using Aho-Corasick or Rabin-Karp alogirthm.

I thought that since I only need one match, I could find other methods similar to the algorithms mentioned, but the constraints posed by this problem might improve performance.

+3


source to share


1 answer


Aho-Korasik is a good choice. After the automaton is created, the input string moves from left to right, so you can find it immediately after the first match. The time complexity is O (the sum of the lengths of all patterns + the position of the first occurrence). This is optimal because it is impossible to find the first match without reading all patterns and all bytes from the buffer before the first match.



+1


source







All Articles