Search for a String or StringBuilder with a pattern range / start // end /

I want to create a function (with a bunch of helper functions if needed) in C # that will do the same thing as awk '/start/,/end/' file

, except that it will include all the last matches, rather than complete on the first.

Let's say we have:

# cat text
"13:08:30:5276604 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"
"13:08:30:5736962 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"
"13:08:30:6227343 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"
"13:08:30:6757752 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"
"13:08:30:7208103 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"
"13:08:30:7668739 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"
"13:08:30:8129079 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"

      

Expected

:

"13:08:30:6227343 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"
"13:08:30:6757752 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"
"13:08:30:7208103 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"
"13:08:30:7668739 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"

      

AWK output:

# awk '/13:08:30:62/,/13:08:30:7/' text
"13:08:30:6227343 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"
"13:08:30:6757752 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"
"13:08:30:7208103 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M"

      

Initially I thought I could just get the regex match by two conditions pattern_1 | pattern_2

, however this won't work if there are values ​​between the match values.

I also found that the C # StringBuilder class has no methods .indexOf()

and .lastIndexOf()

(I have a bit more experience in JAVA, so I thought to use them until I saw that C # does not have them). Since I don't have these methods and will need to implement them, I would like to ask if this approach is appropriate? This section even suggests using String if an extensive search is needed: MSDN - and I can use that too. I decided to use StringBuilder because string concatenation is done all the time, should I use type stringbuilder

when constructing a string (a lot of concatenation) but then converting to a type when converting string

?

I would also like to see this fulfilled and would be great to hear suggestions on how to do this as such. General guidelines and implementation details are appreciated.

+3


source to share


1 answer


If you need to process a potentially large file it is better to use a StreamReader and process it in line with ReadLine . This prevents you from getting the full file in memory like you probably do when using StringBuilder. By using abstract TextReader you can use both string and stream (file).

To check for initial matches, you can use the Regex class . The Match method returns an instance with a property Success

that will be true if a match is found.

To reach the logic you are facing, there are three states: before we found the beginning, before we found the end, while we still find the end. I decided to implement this in an iterator using the keyword yield

as it would give me the automaton for almost free.



Here's the implementation:

void Main()
{
    // use a streamreader to read characters
    // the .ctor accpets an Encoding as second parameter
    using(var sr = new StreamReader(@"sample.txt"))
    {
        ReadFromBeginToEnd("13:08:30:62","13:08:30:7",sr);
    }

    var text =@"
13:08:30:6227343 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M
13:08:30:6757752 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M
13:08:30:7208103 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M
13:08:30:7668739 Main: 41044 - 48.7617 M-- Other PIDS 2 - 79.1016 M";
    using(var sr = new StringReader(text))
    {
        ReadFromBeginToEnd("13:08:30:62","13:08:30:7", sr);
    }
}

// enumerate over the lines from the streamreader
// accepting two regexes, start and end
IEnumerable<string> FromBeginToEnd(TextReader rdr, Regex start, Regex end)
{
   // 1st state
   var line = rdr.ReadLine(); // initial read, null means we're done
   // read the lines until we hit our start match
   while(line != null && !start.Match(line).Success) 
   {
      // don't return these lines
      line = rdr.ReadLine();    
   }
   // 2nd state
   // read the lines while we didn't hit our end match
   while(line != null && !end.Match(line).Success) 
   {
      // return this line to the caller
      yield return line;
      line = rdr.ReadLine();    
   }
   // 3rd state
   // read the lines while we find our end match
   while(line != null && end.Match(line).Success) 
   {
      // return this line to the caller
      yield return line;
      line = rdr.ReadLine();    
   }
   // iterator is done
   yield break;
}

// take a start and end string that can be compiled to a regex
// and a file (fullpath)
void ReadFromBeginToEnd(string start, string end, TextReader reader) 
{
    // loop over the lines that mach the criteria
    // FromBeginToEnd is our custom enumerator
    foreach(var line in FromBeginToEnd(reader, new Regex(start), new Regex(end)))
    {
       // write to standard out
       // but this can be an StreamWriter.WriteLine as well.
       Console.WriteLine(line);
    }
}

      

0


source







All Articles