What's the best way to follow a function to understand what it does?
This question is general, but I'll give you a specific example.
The specific problem that needs to be solved is explained here here . The description is long, so I won't cut and paste, but the basic idea is the input lines S and T (as they call in the code below), find the minimum number of changes you need to make for S to create T. One change could be :
- Insert one letter at either end of the line.
- Remove one letter from either end of the line.
- Change one letter to another.
Below is the solution I am trying to track down. What I am looking for is advice on how best to solve the problem. What are some techniques I can use to read and understand the code (let's throw it back through the debugger).
#include<iostream>
#include<cstring>
#include<stdio.h>
using namespace std;
char S[2010];
char T[2010];
int lens,lent;
int main()
{
int i,j,ma,p;
while(scanf("%s%s",S,T)!=EOF)
{
lens=strlen(S);
lent=strlen(T);
ma=0;p=0;
for(i=0;i<lens;i++)
{
p=0;
for(j=0;j<lent;j++)
{
if(i+j>=lens)
break;
if(S[i+j]==T[j]){p++;}
}
if(ma<p)
ma=p;
if(ma==lent)
break;
}
for(i=0;i<lent;i++)
{
p=0;
for(j=0;j<lens;j++)
{
if(i+j>=lent)
break;
if(T[i+j]==S[j]){p++;}
}
if(ma<p)
ma=p;
if(ma==lent)
break;
}
printf("%d\n",lent-ma);
}
return 0;
}
source to share
Step 1: Explain to yourself what the variables represent:
S
: the string from which we want to extract the substring
T
: the string we want to reach at the end, after you've changed the extracted substring with as many operations as possible
lens
: length of string S
lent
: length of string T
i
: index in S from which the extracted substring starts
j
: the index in string T of the character we want to match with the corresponding character in the substring
p
: the number of matching characters found for the current string being examined
ma
: maximum number of matching characters for any of the substrings
Step 2: Having set these values, it is quite easy to translate the first loop into words:
for loop 1 : selects a start position of the substring
set the match counter to 0, since we start investigation of a new substring
for loop 2 : loops through the substring
if 1 : if there is no char left to read string S, stop looping
if 2 : if the current character in the extracted substring matches
a character in the "goal" string, increment the match counter (p)
if 3 : now, we finished looping through a substring,
if the count of matching characters in the substring and the goal
string was higher than for any of the previous counts,
then store this value as the max count
if 4 : if the max count of matching characters is equal to the
length of the "goal string", dr Moriatry can receive the goal string
with 0 substring changes, and hence, we can stop looping
The next cycle is similar. The roles of S and T are reversed. Note, however, that the S and T roles have not been completely reversed (as some people have said). The end condition for the outer for loop uses the length T in both cases, which makes sense.
Here we are extracting substrings from string T (string "target") and trying to match them to string S. Why are we doing this?
I would expect the person who wrote the code would like to account for cases such as, for example:
S = "b" T = "abc"
If we were only to extract substrings from S and match them to the whole string T starting at the first index (as in the first loop), we would only compare " b
(in S) match a
(first char in T), and then we continue and say, "Since no substring matches, we need three line changes to get the string T" (which is obviously not true, since we can achieve it by choosing "b" as the substring to extract and doing 2 change operations in the end with T)
source to share
The best solution would be to rewrite the code when you understand it. There are several things:
- Watch out for duplicate code. He does the same with S and T, and the roles are reversed. You can create a function
foo()
with both strings as a parameter and usefoo(S,T)
andfoo(T,S)
- Try to break too much depth. When you see a lot of nested loops most of the time, some of the inner loops can be thought of as a function that does something specific.
- rename variables gradually as you understand more of what's going on.
- Last but not least, don't drop the debugger
source to share
The key to understanding the code is understanding the meaning of the variables that change the least, in this case ma and p, and recognizing the coding idioms.
Two code snippets
if(ma<p)
ma=p;
if(ma==lent)
break;
are the "high water mark" idiom for the variable p (with a break condition for the maximum possible value). p is incremented for each character at position j of the LHS substring, starting at i, with the character at the j-position of the RHS. Therefore, for loops indexed by j (inner loops), p is the number of the same characters that are in the same j-positions. So I would rename p (in my head at least!) To "number_of_identical_chars_in_same_pos".
The i-indexed loops find the highest p (stored in ma) for each substring in the LHS that ends with the last LHS character (or earlier if the substring is long). Therefore, ma should be renamed "max_common_chars_in_a_span". Together, the two loops find the maximum number of characters in common at the same position in any substring ending with the last character on either side.
The final step (left to the serious student) is to understand how this solves the problem and what is math, not programming.
source to share
First I would start doing some string manipulation in the program. The first step would be to try and give better names to the variables.
I would replace 'i' with sPos for the position in the string S. Well, at least this is for the first loop, for the second loop, the 'i' changes the value as the position in T, so I would actually then decide to get rid of i and instead have two variables, so each variable has one target that matches its name.
Then try the same for j, ma and p.
Once this is done, I would consider trying to figure out what the two for loops are doing. They seem to be nearly identical. Perhaps they can be split into a single function that gets called twice. Again, try to figure out what this function does and name it very carefully so that the name explains its purpose.
Repeat steps like this until you have some code that evokes some human feelings.
source to share