Substring Algorithm

Can someone explain to me how to solve the substring problem iteratively?

Problem: given two strings S = S 1 S 2 S 3... S n and T = T 1 T 2 T 3... T m, where m is less than or equal to n, determine if T is a substring of S.

+2


source to share


11 replies


Here's a list of string search algorithms



Depending on your needs, a different algorithm may be better suited, but Boyer-Moore is a popular choice.

+4


source


Not sure what language you are working in, but here is an example in C #. This is roughly an n 2 algorithm, but it will do the job.



bool IsSubstring (string s, string t)
{
   for (int i = 0; i <= (s.Length - t.Length); i++)
   {
      bool found = true;

      for (int j = 0; found && j < t.Length; j++)
      {
         if (s[i + j] != t[j])
             found = false;
      }

      if (found)
         return true;
   }

   return false;
}
      

+3


source


A naive algorithm should be tested at every position 0 <i โ‰ค nm S if S i + 1 S i + 2... S i + m= T 1T <sub> 2sub> ... T <sub> msub>. For n = 7 and m = 5:

i = 0:   S1S2S3S4SfiveS6S7
      | | | | |
      T1T2T3T4Tfive

i = 1:   S1S2S3S4SfiveS6S7
        | | | | |
        T1T2T3T4Tfive

i = 2:   S1S2S3S4SfiveS6S7
          | | | | |
          T1T2T3T4Tfive

Algorithm in pseudocode:

// we just need to test if n โ‰ค m 
IF n > m:
    // for each offset on that T can start to be substring of S
    FOR i FROM 0 TO n-m:
        // compare every character of T with the corresponding character in S plus the offset
        FOR j FROM 1 TO m:
            // if characters are equal
            IF S[i+j] == T[j]:
                // if weโ€™re at the end of T, T is a substring of S
                IF j == m:
                    RETURN true;
                ENDIF;
            ELSE:
                BREAK;
            ENDIF;
        ENDFOR;
    ENDFOR;
ENDIF;
RETURN false;

      

+3


source


if (T == string.Empty) return true;
for (int i = 0; i <= S.Length - T.Length; i++) {
    for (int j = 0; j < T.Length; j++) {
        if (S[i + j] == T[j]) {
            if (j == (T.Length - 1)) return true;
        }
        else break;
    }
}
return false;

      

+2


source


It will look something like this:

m==0? return true
cs=0
ct=0
loop
    cs>n-m? break
    char at cs+ct in S==char at ct in T?
    yes:
        ct=ct+1
        ct==m? return true
    no:
        ct=0
        cs=cs+1

end loop
return false

      

+1


source


It might be overkill with the above list of subscript algorithms, but I've always been interested in KMP ( http://en.wikipedia.org/wiki/Knuth -Morris-Pratt_algorithm)

+1


source


// runs in best case O(n) where no match, worst case O(n2) where strings match

var s = "hippopotumus"
var t = "tum"

for(var i=0;i<s.length;i++)
    if(s[i]==t[0])
        for(var ii=i,iii=0; iii<t.length && i<s.length; ii++, iii++){
            if(s[ii]!=t[iii]) break
            else if (iii==t.length-1) console.log("yay found it at index: "+i)
        }

      

+1


source


Here is my PHP variant, which includes checking to make sure the Needle does not exceed the Haystacks length while searching.

<?php

function substring($haystack,$needle) {
        if("" == $needle) { return true; }
        echo "Haystack:\n$haystack\n";
    echo "Needle:\n$needle\n";

        for($i=0,$len=strlen($haystack);$i<$len;$i++){
                if($needle[0] == $haystack[$i]) {
                        $found = true;
                        for($j=0,$slen=strlen($needle);$j<$slen;$j++) {
                                if($j >= $len) { return false; }
                                if($needle[$j] != $haystack[$i+$j]) {
                                        $found = false;
                                        continue;
                                }
                        }
                        if($found) {
                                echo " . . . . . . SUCCESS!!!! startPos: $i\n";
                                return true;
                        }
                }
        }
        echo " . . . . . . FAILURE!\n" ;
        return false;
}

assert(substring("haystack","hay"));
assert(!substring("ack","hoy"));
assert(substring("hayhayhay","hayhay"));
assert(substring("mucho22","22"));
assert(!substring("str","string"));
?>

      

Left in some echoes. Remove if they offend you!

0


source


It is an algorithm O(n*m)

where n and m are the size of each line. In C # it would be something similar to:

   public static bool IsSubtring(char[] strBigger, char[] strSmall)
        {
            int startBigger = 0;
            while (startBigger <= strBigger.Length - strSmall.Length)
            {
                int i = startBigger, j = 0;

                while (j < strSmall.Length && strSmall[j] == strBigger[i])
                {
                    i++;
                    j++;
                }

                if (j == strSmall.Length)
                    return true;
                startBigger++;
            }

            return false;
        }

      

0


source


Although this is a rather old post, I am trying to answer it. Please correct me if something is wrong

package com.amaze.substring;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

public class CheckSubstring {

/**
 * @param args
 * @throws IOException 
 */
public static void main(String[] args) throws IOException {
    // TODO Auto-generated method stub

    BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

    System.out.println("Please enter the main string");
    String mainStr = br.readLine();

    System.out.println("Enter the substring that has to be searched");
    String subStr = br.readLine();

    char[] mainArr = new char[mainStr.length()];
    mainArr = mainStr.toCharArray();
    char[] subArr = new char[subStr.length()];
    subArr = subStr.toCharArray();
    boolean tracing = false;
    //System.out.println("Length of substring is "+subArr.length);
    int j = 0;

    for(int i=0; i<mainStr.length();i++){

        if(!tracing){
            if(mainArr[i] == subArr[j]){
                tracing = true;
                j++;
            }
        } else {
            if (mainArr[i] == subArr[j]){
                //System.out.println(mainArr[i]);
                //System.out.println(subArr[j]);
                j++;
                System.out.println("Value of j is "+j);
                if((j == subArr.length)){
                    System.out.println("SubString found");
                    return;
                }
            } else {
                j=0;
                tracing = false;
            }
        }
    }

    System.out.println("Substring not found");

}

}

      

0


source


I know I am late for the game, but here is my version (in C #):

    bool isSubString(string subString, string supraString)
    {
        for (int x = 0; x <= supraString.Length; x++)
        {
            int counter = 0;
            if (subString[0] == supraString[x]) //find initial match
            {
                for (int y = 0; y <= subString.Length; y++)
                {
                    if (subString[y] == supraString[y+x])
                    {
                        counter++;
                        if (counter == subString.Length)
                        {
                            return true;
                        }
                    } 
                }
            }
        }
        return false;
    }

      

0


source







All Articles