Regex is stuck for some entries

A few times Regex got stuck on some values, although this gives the result for most documents.

I'm talking about when the scenerio is when it's stuck.

  1- collection = Regex.Matches(document, pattern,RegexOptions.Compiled);
  2-  if (collection.Count > 0) //This Line
            {

      

I debugged the solution and wanted to see the collection values in the viewport. I have seen the following result for most properties.

Function evaluation disabled because a previous function evaluation timed out. You must continue execution to reenable function evaluation.

      

Later he got stuck on the second line.

I can see that there is a problem with the regex, so it got caught in a loop.

Question: I have no exceptions for this. Is there a way to get an exception after a timeout so that my tool can continue working.

 Regex:      @"""price"">(.|\r|\n)*?pound;(?<data>.*?)</span>"

 Part of Document : </span><span>1</span></a></li>\n\t\t\t\t<li>\n\t\t\t\t\t<span class=\"icon icon_floorplan touchsearch-icon touchsearch-icon-floorplan none\">Floorplans: </span><span>0</span></li>\n\t\t\t\t</ul>\n\t\t</div>\n    </div>\n\t</div>\n<div class=\"details clearfix\">\n\t\t<div class=\"price-new touchsearch-summary-list-item-price\">\r\n\t<a href=\"/commercial-property-for-sale/property-47109002.html\">POA</a></div>\r\n<p class=\"price\">\r\n\t\t\t<span>POA</span>\r\n\t\t\t\t</p>\r\n\t<h2 class=\"address bedrooms\">\r\n\t<a id=\"standardPropertySummary47109002\"

      

+3


source to share


1 answer


How do I get an exception if the search in the Regex takes an unreasonably long time?

Please read below about setting the timeout for regular expression searches.

MSDN: Regex.MatchTimeout Property

The MatchTimeout property specifies the approximate maximum time interval for a Regex instance to complete one match before it times out. The regular expression engine throws a RegexMatchTimeoutException during the next time check after the timeout interval has expired . This prevents the regular expression engine from processing input strings that require excessive backtracking. For more information, see Backtracking in Regular Expressions and Best Practices for Regular Expressions in the .NET Framework.

    public static void Main()
    {
        AppDomain domain = AppDomain.CurrentDomain;
        // Set a timeout interval of 2 seconds.
        domain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT", TimeSpan.FromSeconds(2));
        Object timeout = domain.GetData("REGEX_DEFAULT_MATCH_TIMEOUT");
        Console.WriteLine("Default regex match timeout: {0}",
                            timeout == null ? "<null>" : timeout);

        Regex rgx = new Regex("[aeiouy]");
        Console.WriteLine("Regular expression pattern: {0}", rgx.ToString());
        Console.WriteLine("Timeout interval for this regex: {0} seconds",
                            rgx.MatchTimeout.TotalSeconds);
    }

    // The example displays the following output: 
    //       Default regex match timeout: 00:00:02 
    //       Regular expression pattern: [aeiouy] 
    //       Timeout interval for this regex: 2 seconds

      

Why is my Regex getting stuck?

First of all, try to optimize your Regex by minimizing the backlink if you can. stribizhev commented on the improvement, so he is nice to him.



Other: is your regex actually equivalent to "price"> [\ s \ S]? pound; (?.?) (C # declaration: @ "" price "> [\ s \ S]? pound;") (?.?). It is much better since there is much less back. - stribizhev Jun 4 @ 9:23 am

Second, if you are having problems with certain values, the first thing you could do to keep track of them is to make the logic per iterate (match) instead of capturing all the matches with a one-time, liner.

MSDN: Match.NextMatch Method

   public static void Main()
   {
      string pattern = "a*";
      string input = "abaabb";

      Match m = Regex.Match(input, pattern);
      while (m.Success) {
         Console.WriteLine("'{0}' found at index {1}.", 
                           m.Value, m.Index);
         m = m.NextMatch();
      }
   }

      

To improve test performance without working with a template, it is common to place the Regex objects in a static class and instantiate only once, and add RegexOptions.Compiled to your Regex when you create it (which you did). ( Source )

PS. This can be useful to deliberately cause a timeout that is always playing, such as an infinite loop. I'll cover this below.

string pattern = @"/[a-zA-Z0-9]+(\[([^]]*(]"")?)+])?$";
string input = "/aaa/bbb/ccc[@x='1' and @y=\"/aaa[name='z'] \"]";

      

+5


source







All Articles