What's the best way to follow a function to understand what it does?

This question is general, but I'll give you a specific example.

The specific problem that needs to be solved is explained here here . The description is long, so I won't cut and paste, but the basic idea is the input lines S and T (as they call in the code below), find the minimum number of changes you need to make for S to create T. One change could be :

  • Insert one letter at either end of the line.
  • Remove one letter from either end of the line.
  • Change one letter to another.

Below is the solution I am trying to track down. What I am looking for is advice on how best to solve the problem. What are some techniques I can use to read and understand the code (let's throw it back through the debugger).

#include<iostream>
#include<cstring>
#include<stdio.h>
using namespace std;
char S[2010];
char T[2010];
int lens,lent;
int main()
{
    int i,j,ma,p;

    while(scanf("%s%s",S,T)!=EOF)
    {
        lens=strlen(S);
        lent=strlen(T);
        ma=0;p=0;
        for(i=0;i<lens;i++)
        {
            p=0;
            for(j=0;j<lent;j++)
            {
                if(i+j>=lens)
                    break;
                if(S[i+j]==T[j]){p++;}
            }
            if(ma<p)
                ma=p;
            if(ma==lent)
                break;
        }
        for(i=0;i<lent;i++)
        {
            p=0;
            for(j=0;j<lens;j++)
            {
                if(i+j>=lent)
                    break;
                if(T[i+j]==S[j]){p++;}
            }
            if(ma<p)
                ma=p;
            if(ma==lent)
                break;
        }
        printf("%d\n",lent-ma);
    }
    return 0;
}

      

+3


source to share


5 answers


Step 1: Explain to yourself what the variables represent:

S

: the string from which we want to extract the substring

T

: the string we want to reach at the end, after you've changed the extracted substring with as many operations as possible

lens

: length of string S

lent

: length of string T

i

: index in S from which the extracted substring starts

j

: the index in string T of the character we want to match with the corresponding character in the substring

p

: the number of matching characters found for the current string being examined

ma

: maximum number of matching characters for any of the substrings




Step 2: Having set these values, it is quite easy to translate the first loop into words:

for loop 1 :    selects a start position of the substring

    set the match counter to 0, since we start investigation of a new substring


    for loop 2 :    loops through the substring

        if 1 :      if there is no char left to read string S, stop looping

        if 2 :      if the current character in the extracted substring matches
                a character in the "goal" string, increment the match counter (p)


    if 3 :      now, we finished looping through a substring,
            if the count of matching characters in the substring and the goal
            string was higher than for any of the previous counts,
            then store this value as the max count

    if 4 :      if the max count of matching characters is equal to the 
            length of the "goal string", dr Moriatry can receive the goal string
            with 0 substring changes, and hence, we can stop looping

      


The next cycle is similar. The roles of S and T are reversed. Note, however, that the S and T roles have not been completely reversed (as some people have said). The end condition for the outer for loop uses the length T in both cases, which makes sense.

Here we are extracting substrings from string T (string "target") and trying to match them to string S. Why are we doing this?

I would expect the person who wrote the code would like to account for cases such as, for example:

S = "b"     T = "abc"

      

If we were only to extract substrings from S and match them to the whole string T starting at the first index (as in the first loop), we would only compare " b

(in S) match a

(first char in T), and then we continue and say, "Since no substring matches, we need three line changes to get the string T" (which is obviously not true, since we can achieve it by choosing "b" as the substring to extract and doing 2 change operations in the end with T)

+3


source


The best solution would be to rewrite the code when you understand it. There are several things:



  • Watch out for duplicate code. He does the same with S and T, and the roles are reversed. You can create a function foo()

    with both strings as a parameter and use foo(S,T)

    andfoo(T,S)

  • Try to break too much depth. When you see a lot of nested loops most of the time, some of the inner loops can be thought of as a function that does something specific.
  • rename variables gradually as you understand more of what's going on.
  • Last but not least, don't drop the debugger
+2


source


The key to understanding the code is understanding the meaning of the variables that change the least, in this case ma and p, and recognizing the coding idioms.

Two code snippets

if(ma<p)
    ma=p;
if(ma==lent)
    break;

      

are the "high water mark" idiom for the variable p (with a break condition for the maximum possible value). p is incremented for each character at position j of the LHS substring, starting at i, with the character at the j-position of the RHS. Therefore, for loops indexed by j (inner loops), p is the number of the same characters that are in the same j-positions. So I would rename p (in my head at least!) To "number_of_identical_chars_in_same_pos".

The i-indexed loops find the highest p (stored in ma) for each substring in the LHS that ends with the last LHS character (or earlier if the substring is long). Therefore, ma should be renamed "max_common_chars_in_a_span". Together, the two loops find the maximum number of characters in common at the same position in any substring ending with the last character on either side.

The final step (left to the serious student) is to understand how this solves the problem and what is math, not programming.

+2


source


First I would start doing some string manipulation in the program. The first step would be to try and give better names to the variables.

I would replace 'i' with sPos for the position in the string S. Well, at least this is for the first loop, for the second loop, the 'i' changes the value as the position in T, so I would actually then decide to get rid of i and instead have two variables, so each variable has one target that matches its name.

Then try the same for j, ma and p.

Once this is done, I would consider trying to figure out what the two for loops are doing. They seem to be nearly identical. Perhaps they can be split into a single function that gets called twice. Again, try to figure out what this function does and name it very carefully so that the name explains its purpose.

Repeat steps like this until you have some code that evokes some human feelings.

+1


source


  • Extract the two outer contours into your own functions.
  • Find commonality (hint: they only change ma

    ).
  • Extract this commonality.
  • Extract the inner loops into your own function (hint: the code is identical).
  • Write unit tests for each function to test your understanding.
+1


source







All Articles