How to determine if a duplicate pattern exists

My question is not language specific ... I would probably implement this in C # or Python unless there is a specific language feature that helps me get what I'm looking for.

Is there some algorithm that anyone knows about this can help me determine if a list of numbers contains a repeating pattern?

Let's say I have multiple lists of numbers ...

[12, 4, 5, 7, 1, 2]
[1, 2, 3, 1, 2, 3, 1, 2, 3]
[1, 1, 1, 1, 1, 1]
[ 1, 2, 4, 12, 13, 1, 2, 4, 12, 13]

      

I need to determine if there is a duplicate pattern in each list ... For example, list 1 returns false, but also lists 2, 3 and 4 return true.

I thought maybe taking the count of each value that appears in the list and if val 1 == val 2 == val n ... then this will do it. Any better ideas?

+3


source to share


6 answers


You want to look at the autocorrelation of the signal. Autocorrelation basically does the convolution of the signal with itself. When you iteratively shift one signal over the other and there is a repeating pattern, the output will resonate strongly.



+3


source


One option is to look at compression algorithms, some of which rely on finding repeating patterns and replacing them with a different symbol. In your case, you just need the part that identifies the template. You may find that it is similar to the method you described, although



+2


source


The second and fourth lines are periodic; I am assuming that you are looking for an algorithm for detecting periodic strings. The fastest string matching algorithms need to find string periods in order to compute their switching rules.

Knuth-Morris-Pratt ", for example, calculates for each prefix P [0..k] of the pattern P, the length SP [k] of the longest native suffix P [s..k] from P [0..k], which is exactly matches the prefix P [0 .. (ks)]. If SP [k] k / 2, then P [0..k] is aperiodic, otherwise it is the prefix of the string with period k - SP [k].

+2


source


Assuming that your "repeating pattern" always repeats in full, as your sample data suggests, you can simply think of your array as a collection of repeating arrays of equal length. Value:

[1, 2, 3, 1, 2, 3, 1, 2, 3]

repeated three times [1, 2, 3]

.

This means that you can simply check if each x value in the array matches each other. So: array[0] == array[3] == array[6]


array[1] == array[4] == array[7]


array[2] == array[5] == array[8]


Since you don't know the length of the repeating pattern, you just have to try all possible lengths until you find the pattern or finish the possible shorter arrays. I'm sure there are optimizations that could be added to the next one, but it works (if I understand the question correctly, of course).

static void Main(string[] args)
{
    int[] array1 = {12, 4, 5, 7, 1, 2};
    int[] array2 = {1, 2, 3, 1, 2, 3, 1, 2, 3};
    int[] array3 = {1, 1, 1, 1, 1, 1 };
    int[] array4 = {1, 2, 4, 12, 13, 1, 2, 4, 12, 13 };

    Console.WriteLine(splitMethod(array1));
    Console.WriteLine(splitMethod(array2));
    Console.WriteLine(splitMethod(array3));
    Console.WriteLine(splitMethod(array4));

    Console.ReadLine();
}

static bool splitMethod(int[] array)
{
    for(int patternLength = 1; patternLength <= array.Length/2; patternLength++)
    {
        // if the pattern length doesn't divide the length of the array evenly,
        // then we can't have a pattern of that length.
        if(array.Length % patternLength != 0)
        {
            continue;
        }

        // To check if every x value is equal, we need to give a start index
        // To begin our comparisons at.
        // We'll start at index 0 and check it against 0+x, 0+x+x, 0+x+x+x, etc.
        // Then we'll use index 1 and check it against 1+x, 1+x+x, 1+x+x+x, etc.
        // Then... etc.
        // If we find that every x value starting at a given start index aren't
        // equal, then we'll continue to the next pattern length.
        // We'll assume our patternLength will produce a pattern and let
        // our test determines if we don't have a pattern.
        bool foundPattern = true;
        for (int startIndex = 0; startIndex < patternLength; startIndex++)
        {
            if (!everyXValueEqual(array, patternLength, startIndex))
            {
                foundPattern = false;
                break;
            }
        }

        if (foundPattern)
        {
            return true;
        }
    }
    return false;
}

static bool everyXValueEqual(int[] array, int x, int startIndex)
{
    // if the next index we want to compare against is outside the bounds of the array
    // we've done all the matching we can for a pattern of length x.
    if (startIndex+x > array.Length-1)
        return true;

    // if the value at starIndex equals the value at startIndex + x
    // we can go on to test values at startIndex + x and startIndex + x + x
    if (array[startIndex] == array[startIndex + x])
        return everyXValueEqual(array, x, startIndex + x);

    return false;
}

      

+1


source


Since you are looking for duplicate patterns, you can force your array into a string and run a regex against it. This is my second answer, I'm just playing here.

    static Regex regex = new Regex(@"^(?<main>(?<v>;\d+)+?)(\k<main>)+$", RegexOptions.Compiled);
    static bool regexMethod(int[] array)
    {
        string a = ";" + string.Join(";", array);
        return regex.IsMatch(a);
    }

      

Regular expression

  • (?<v>;\d+)

    - A group named "v" which matches a semicolon (in this case a separator) and 1 or more digits

  • (?<main>(?<v>;\d+)+?)

    - a group named "main" that matches group "v" 1 or more times, but the least number of times it can match a regular expression.

  • (\k<main>)+

    - matches text that matches the "main" group 1 or more times

  • ^ ... $

    - they snap the ends of the pattern to the ends of the string.

+1


source


Simple pattern recognition is the problem of compression algorithms. Depending on the type of input and the type of templates you are looking for, the selection algorithm might be very different - just think that any file is an array of bytes, and there are many types of compression for different types of data. Lossless compression finds exact patterns that repeat and compress compression — approximate models where the approximation is limited to some "real" consideration.

In your case, you can apply pseudo-clamping compression where you will start to populate the list of occurring sequences

here's a pseudo-suggestion:

//C#-based pseudo code

int[] input = GetInputData();
var encounters = new Dictionary<ItemCount<int[],int>>();// the string and the number of times it found

int from = 0;
for(int to=0; to<input.Length; i++){ 
  for (int j = from; j<=i; j++){ // for each substring between 'from' and 'i'
    if (encounters.ContainsKey(input.SubArray(j,i)){
      if (j==from) from++;                  // if the entire substring already exists - move the starting point
      encounters[input.SubArray(j,i)] += 1; // increase the count where the substring already exists
    } else {
     // consider: if (MeetsSomeMinimumRequirements(input.SubArray(j,i))
      encounters.Add(input.SubArray(j,i),1); //add a new pattern
    }
  }
}
Output(encounters.Where(itemValue => itemValue.Value>1); // show the patterns found more than once

      

I haven't debugged the sample above, so use that as a starting point. The basic idea is that you will have a list encounters

where various substrings will be collected and counted, the most frequent will have the highest Value

at the end.

You can change the algorithm above by keeping some substring function instead of whole substring or adding minimum requirements like minimum length etc. Too many options full discussion is impossible in the post.

0


source







All Articles