How to implement a complex algorithm with TDD

Question

How to implement a complex algorithm with TDD

I have a rather complex algorithm that I would like to implement (in Java) with TDD. The algorithm for translating natural languages is called stack decoding .

When I tried to do this, I was able to write and fix some simple test cases (empty translation, one word, etc.), but I cannot get to the direction of the algorithm I want. I mean, I cannot figure out how to write a massive amount of algorithm as described below in child steps.

This is pseudocode of the algorithm:

1: place empty hypothesis into stack 0
2: for all stacks 0...n − 1 do
3: for all hypotheses in stack do
4: for all translation options do
5: if applicable then
6: create new hypothesis
7: place in stack
8: recombine with existing hypothesis if possible
9: prune stack if too big
10: end if
11: end for
12: end for
13: end for

Am I missing any way that I can follow the baby steps, or should I just get the lighting and make a big implementation?

+3

java algorithm tdd

oshai Sep 16 14 at 12:01

source to share

3 answers

TL; DR

Repeat your task as checking the API implementation.
Since the algorithm is expressed in terms of concepts (hypothesis, translation variant), which are quite complex in themselves, program it from the bottom up by first developing (using TDD) classes that encapsulate those concepts.
Since the algorithm is not language specific, but an abstract of language specific operations (all translations as applicable), encapsulate the language specific parts into a separate object, use dependency injection for that object, and test with simple fake implementations of that object.
Use your knowledge of the algorithm to suggest good test cases.

Check the API

By focusing on the implementation (algorithm), you are making a mistake. Instead, first imagine that you have a magic class that was doing the work that the algorithm does. What will its API be? What would be its initial data, what would be its results? And what will be the required connections between inputs and outputs. You want to encapsulate your algorithm in this class and remake your problem as generating this class.

In this case, it seems that the input is a sentence that has been denoted (split into words) and the output is a tokenized sentence that has been translated into another language. So I assume the API looks something like this:

 interface Translator {
   /**
    * Translate a tokenized sentence from one language to another.
    *
    * @param original
    *    The sentence to translate, split into words,
    *    in the language of the {@linkplain #getTranslatesFrom() locale this
    *    translates from}.
    * @return The translated sentence, split into words,
    *    in the language of the {@linkplain #getTranslatesTo() locale this
    *    translates to}. Not null; containing no null or empty elements.
    *    An empty list indicates that the translator was unable to translate the
    *    given sentence.
    *
    * @throws NullPointerException
    *    If {@code original} is null, or contains a null element.
    * @throws IllegalArgumentException
    *    If {@code original} is empty or has any empty elements.
    */
   public List<String> translate(List<String> original);

   public Locale getTranslatesFrom();

   public Locale getTranslatesTo();
 }

This is an example of a strategy design pattern. So your challenge becomes, not "how to use TDD to implement this algorithm," but rather "how to use TDD to implement a specific case of the strategy design pattern".

Next, you need to come up with a sequence of test cases, from the simplest to the most complex, that use this API. That is, the set of initial sentence values is passed to the method translate

. For each of these inputs, you must give a set of output limits. The translation must comply with these restrictions. Note that you already have some exit restrictions:

Not equal to zero; not empty; contains no null or empty elements.

You will need some sample sentences for which you know exactly what the algorithm should output. I suspect you will find very few such suggestions. Arrange these tests from the easiest to pass to the most difficult to pass. This will become your TODO list when you implement the class Translator

.

Implement from the bottom up

You will find that your code is missing more than two such cases. So how can you thoroughly test your code?

Take another look at the algorithm. It is difficult enough that the method translate

does not do all the work directly. It will delegate most of the work to other classes.

push an empty hypothesis onto stack 0

You need a class Hypothesis

. A HypothesisStack

class?

for all translation options do

Do you need a class TranslationOption

?

if applicable then

Is there a method TranslationOption.isApplicable(...)

?

recombine with existing hypothesis if possible

Is there a method Hypothesis.combine(Hypothesis)

? A Hypothesis.canCombineWith(Hypothesis)

method?

prunes if too large

Is there a method HypothesisStack.prune()

?

Your implementation will probably need additional classes. You can implement each of these separately using TDD. Your multiple class tests Translator

will eventually become integration tests. Other classes will be easier to test than Translator

because they will have well-defined narrow definitions of what they should do.

So postpone the implementation Translator

until you have injected the classes it delegates to. That is, I recommend that you write your code from the bottom up, not top to bottom. The final step is to write code that implements the algorithm that you specified. At this point, you have classes that you can use to write an implementation using Java code that is very similar to the pseudocode of your algorithm. That is, the body of your method translate

will be only about 13 lines long.

Elimination of complications in real age in the associated object

Your translation algorithm is general purpose; it can be used to translate between any pair of languages. I think that's what makes it applicable for a particular language pair translation is for all translation options

and if applicable then

parts. I guess the latter can be implemented by the method TranslationOption.isApplicable(Hypothesis)

. So, what does the language-specific algorithm do to generate translation options. Imagine to a factory object that the class belongs to. Something like that:

 interface TranslationOptionGenerator {

    Collection<TranslationOption> getOptionsFor(Hypothesis h, List<String> original);

 }

Now you are probably thinking about translating between real languages with all their nasty complexities. But you don't need this complexity to test your algorithm. You can test it using a fake language pair which is much simpler than real languages. Or (equivalently) using TranslationOptionGenerator

, which is not as rich as practical. Use dependency injection to bind Translator

to TranslationOptionGenerator

.

Use simple fake implementations instead of Realistic Associates

Now let's look at some of the most extreme simple cases TranslationOptionGenerator

that an algorithm must deal with:

For example, when there are no options for an empty hypothesis
There is only one option
There are only two options.
English translator Harappan . Nobody knows how to translate to Harappan, so this translator points out that he cannot translate every time.
Identity translator: A translator that translates English to English.
Limited English & Newborn Translator: He can only translate the sentences "I'm hungry", "I need to change" and "I'm tired", which he translates to "Waaaa!"
Simple English to English translator: most words are the same, but some of them have different spellings.

You can use this to generate test cases that don't require some of the loops, or don't require tags isApplicable

.

Add these test cases to the TODO list. You will need to write fake simple TeranslatorOptionGenerator

objects to use in these test cases.

+2

Raedwald Sep 16 14 at 12:46

source to share

TL; DR: Start with nothing, and don't write any code that the test doesn't call for . Then you can't help but TDD solution

When you build something through TDD you have to start from somewhere and then run test cases until it does what you want. That's why they call it red green refactoring.

Your first test would be to see if the inner object has hypothesis 0 (empty implementation would be empty. [Red]). Then you initialize the hypothesis list [green].

Then you have to write a hypothesis test (just created [red]). Implement "if applicable" logic and apply it to one hypothesis [green].

You write a test that, when a hypothesis is applicable, then create a new hypothesis (check if hypothesis 1 of hypothesis exists on [red]). Implement the logic of the hypothesis and insert it into the body of the if. [Green]

on the contrary, if the hypothesis is not applicable then do nothing ([green])

Just follow this logic to test the algorithm individually over time. It is much easier to do this with an incomplete class than with a complete one.

+1

C Bauer Sep 16 14 at 12:31

source to share

Rogério · Accepted Answer · 2014-09-18T21:05:32+0000

The key point here is understanding that "baby steps" don't necessarily mean writing only a small amount of production code at a time. You can also write a lot if you do so to pass a relatively small and simple test.

Some people believe that TDD can only be applied in one way, and that is by writing unit tests. It is not true. TDD does not dictate which tests you should write. Perfectly true for TDD with integration tests that execute a lot of code. However, each test should focus on one well-defined scenario. This scenario is the "kid step" that really matters, not the number of classes that can take the test.

Personally, I develop a complex Java library purely with tests at the integration level, strictly following the TDD process. Quite often, I create a very small and simple integration analysis that ends up requiring a lot of programming effort to make a pass, requiring changes to several existing classes and / or creating new ones. This has worked well for me over the past 5 years, to the point where I now have over 1300 of these tests.

How to implement a complex algorithm with TDD

TL; DR

Check the API

Implement from the bottom up

Elimination of complications in real age in the associated object

Use simple fake implementations instead of Realistic Associates

More articles: