If there is a string in the list (given at compile time): Is HashSet the fastest solution?

Given a fixed list of strings at compile time, for example:

"zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"

      

By using HashSet

, we have a very fast (O (1)) way to indicate whether String

at runtime enters a list of strings.

For example:

Set<String> SET = new HashSet<>(Arrays.asList( "zero", "one", "two", "three",
        "four", "five", "six", "seven", "eight", "nine"));

boolean listed = SET.contains("some-text");

      

Is there an even faster way to find out if String

the original list of strings (given at compile time) is at runtime or HashSet

the fastest solution?

Let him formalize the question

Given the interface below:

interface Checker {
    String[] VALUES = { "zero", "one", "two", "three", "four", "five", "six",
            "seven", "eight", "nine" };

    boolean contains(String s);
}

      

Provide as much implementation as possible if the values โ€‹โ€‹listed in Checker.VALUES

will not change (for example, you can use these literals in your code if you like).

Demo: HashSetChecker

Implementation

The implementation that uses HashSet

would look like this:

class HashSetChecker implements Checker {
    private final Set<String> set = new HashSet<>(Arrays.asList(VALUES));
    @Override
    public boolean contains(String s) {
        return set.contains(s);
    }
}

      

Code to test implementation

When testing, we want to test the method Checker.contains()

with initial interned strings, as well as other interned strings ( String

literals) that will not be found, as well as strings that have the same value (equal) but not interned strings. The following array will be used for this purpose CheckerTester.TESTS

.

public class CheckerTester {
    private static final String[] TESTS = { "zero", "one", "two", "three",
            "four", "five", "six", "seven", "eight", "nine", "ten", "eleven",
            "twelve", "thirteen", "fourteen", "fifteen", "sixteen",
            "seventeen", "eighteen", "nineteen",

            new String("zero"), new String("one"), new String("two"),
            new String("three"), new String("four"), new String("five"),
            new String("six"), new String("seven"), new String("eight"),
            new String("nine"), new String("ten"), new String("eleven"),
            new String("twelve"), new String("thirteen"),
            new String("fourteen"), new String("fifteen"),
            new String("sixteen"), new String("seventeen"),
            new String("eighteen"), new String("nineteen") };

    public static void test(Checker checker) {
        final int N = 1_000_000;

        long start = System.nanoTime();

        for (int i = 0; i < N; i++)
            for (String test : TESTS)
                checker.contains(test);

        long end = System.nanoTime();

        System.out.printf("%s: %d ms\n", checker.getClass().getName(),
                (end - start) / 1_000_000);
    }
}

      

+3


source to share


3 answers


Let's look at some implementations:

First idea: HashSet

with a large capacity

Some might argue that multiple lines may end up in the same bucket HashSet

, so use a larger initial capacity:

class HashSetChecker2 implements Checker {
    private final Set<String> set = new HashSet<>(1000);
    { set.addAll(Arrays.asList(VALUES)); }

    @Override
    public boolean contains(String s) {
        return set.contains(s);
    }
}

      

Using HashMap

instead ofHashSet

HashSet

uses HashMap

in its implementation, just do the same: get rid of the "shell" HashSet

:

class HashMapChecker implements Checker {
    private final Map<String, Object> map = new HashMap<>(1000);
    {
        for (String s : VALUES)
            map.put(s, s);
    }
    @Override
    public boolean contains(String s) {
        return map.containsKey(s);
    }
}

      

TreeSet

Some might say give it a TreeSet

try (she's sorted so she gets a chance). I know this is O (log (n)), but n

not great (10 in this case):

class TreeSetChecker implements Checker {
    private final Set<String> set = new TreeSet<>(Arrays.asList(VALUES));
    @Override
    public boolean contains(String s) {
        return set.contains(s);
    }
}

      

Checking for short circuit OR (if-else logic):

class OrChecker implements Checker {
    @Override
    public boolean contains(String s) {
        return "zero".equals(s) || "one".equals(s) || "two".equals(s)
                || "three".equals(s) || "four".equals(s) || "five".equals(s)
                || "six".equals(s) || "seven".equals(s) || "eight".equals(s)
                || "nine".equals(s);
    }
}

      

The reference sequence will then revert back to short OR check

Some may say that first we need to check if we have a link String

, and if not, then go back to checking for short-circuiting OR:

class RefOrChecker extends OrChecker {
    @Override
    public boolean contains(String s) {
        return "zero" == s || "one" == s || "two" == s || "three" == s
                || "four" == s || "five" == s || "six" == s || "seven" == s
                || "eight" == s || "nine" == s || super.contains(s);
    }
}

      

Usage switch

: The fastest I found

Since we have a fixed list String

known at compile time, we can take advantage of the ability to be used String

in operations switch

.



We can add case

for each String

of the fixed list and return true

, and add a default

return case false

.

class SwitchChecker implements Checker {
    @Override
    public boolean contains(String s) {
        switch (s) {
            case "zero":
            case "one":
            case "two":
            case "three":
            case "four":
            case "five":
            case "six":
            case "seven":
            case "eight":
            case "nine":
                return true;
            default:
                return false;
        }
    }
}

      

New conclusion: built-in switches (2 switch

blocks)

Maaartinus's answer about perfect hashing got me thinking. Even if we have a perfect hash, it should still work across all String

runtime-provided content that we want to test. So instead, we have to use something that's available right in String

: its length . Based on the length String

, we use switch

, and internally switch

we use the inner switch

one containing only strings with the specified length. At the same time, we reduce the number of operators case

inside switch

:

class EmbeddedSwitchChecker implements Checker {
    @Override
    public boolean contains(String s) {
        switch (s.length()) {
        case 3:
            switch (s) {
                case "one":
                case "two":
                case "six":
                    return true;
                default:
                    return false;
            }
        case 4:
            switch (s) {
                case "zero":
                case "four":
                case "five":
                case "nine":
                    return true;
                default:
                    return false;
            }
        case 5:
            switch (s) {
                case "three":
                case "seven":
                case "eight":
                    return true;
                default:
                    return false;
            }
        default:
            return false;
        }
    }
}

      

New Discovery: CharSwitchChecker: Champion

This is basically a combination of the improved EmbeddedSwitchChecker

and OldCurmudgeon 's final machine idea : here we use switch

for the first character String

(but check its length first) and based on that we either narrowed it down to one possible String

, or if not, we also check the 2 nd character in this case there String

is only one possible (and we can solve it by calling String.equals()

):

class CharSwitchChecker implements Checker {
    @Override
    public boolean contains(String s) {
        final int length = s.length();
        if (length < 3 || length > 5)
            return false;

        switch (s.charAt(0)) {
        case 'z':
            return "zero".equals(s);
        case 'o':
            return "one".equals(s);
        case 't':
            return s.charAt(1) == 'w' ? "two".equals(s) : "three".equals(s);
        case 'f':
            return s.charAt(1) == 'o' ? "four".equals(s) : "five".equals(s);
        case 's':
            return s.charAt(1) == 'i' ? "six".equals(s) : "seven".equals(s);
        case 'e':
            return "eight".equals(s);
        case 'n':
            return "nine".equals(s);
        }
        return false;
    }
}

      

Test results

Here are the test results:

                         TIME        HOW FAST (compared to HashSetChecker)
-----------------------------------------------------------------------------
HashSetChecker:          929 ms       1.00x
HashSetChecker2:         892 ms       1.04x
HashMapChecker:          873 ms       1.06x
TreeSetChecker:         2265 ms       0.41x
OrChecker:              1815 ms       0.51x
RefOrChecker:           1708 ms       0.54x
SwitchChecker:           538 ms       1.73x
EmbeddedSwitchChecker:   467 ms       1.99x
CharSwitchChecker:       436 ms       2.13x

      

The solution is SwitchChecker

about 1.7x faster , EmbeddedSwitchChecker

2x faster , and the champion is CharSwitchChecker

about 2.13x faster than the implementation HashSetChecker

. As expected, HashSet

with a lot of initial power and solutions are HashMap

slightly faster and all other solutions are lagging behind.

Complete guided test program (+ all solutions listed)

The complete Runnalbe plus benchmark All the solutions listed are here in one box for those who want to try or experiment with new implementations.

Edit: Following Luiggi Mendoza's recommendation for rules for executing a micro-criterion test , I changed the method main()

for testing. I run the whole test twice and I only analyze the second result. Also, since tests don't create new objects in a loop, I see no reason to call System.gc()

at all.

import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import java.util.TreeSet;

interface Checker {
    String[] VALUES = { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" };

    boolean contains(String s);
}

class HashSetChecker implements Checker {
    private final Set<String> set = new HashSet<>(Arrays.asList(VALUES));

    @Override
    public boolean contains(String s) {
        return set.contains(s);
    }
}

class HashSetChecker2 implements Checker {
    private final Set<String> set = new HashSet<>(1000);
    {
        set.addAll(Arrays.asList(VALUES));
    }

    @Override
    public boolean contains(String s) {
        return set.contains(s);
    }
}

class HashMapChecker implements Checker {
    private final Map<String, Object> map = new HashMap<>(1000);
    {
        for (String s : VALUES)
            map.put(s, s);
    }

    @Override
    public boolean contains(String s) {
        return map.containsKey(s);
    }
}

class TreeSetChecker implements Checker {
    private final Set<String> set = new TreeSet<>(Arrays.asList(VALUES));

    @Override
    public boolean contains(String s) {
        return set.contains(s);
    }
}

class OrChecker implements Checker {
    @Override
    public boolean contains(String s) {
        return "zero".equals(s) || "one".equals(s) || "two".equals(s) || "three".equals(s)
                || "four".equals(s) || "five".equals(s) || "six".equals(s) || "seven".equals(s)
                || "eight".equals(s) || "nine".equals(s);
    }
}

class RefOrChecker extends OrChecker {
    @Override
    public boolean contains(String s) {
        return "zero" == s || "one" == s || "two" == s || "three" == s || "four" == s || "five" == s
                || "six" == s || "seven" == s || "eight" == s || "nine" == s || super.contains(s);
    }
}

class SwitchChecker implements Checker {
    @Override
    public boolean contains(String s) {
        switch (s) {
        case "zero":
        case "one":
        case "two":
        case "three":
        case "four":
        case "five":
        case "six":
        case "seven":
        case "eight":
        case "nine":
            return true;
        default:
            return false;
        }
    }
}

class EmbeddedSwitchChecker implements Checker {
    @Override
    public boolean contains(String s) {
        switch (s.length()) {
        case 3:
            switch (s) {
            case "one":
            case "two":
            case "six":
                return true;
            default:
                return false;
            }
        case 4:
            switch (s) {
            case "zero":
            case "four":
            case "five":
            case "nine":
                return true;
            default:
                return false;
            }
        case 5:
            switch (s) {
            case "three":
            case "seven":
            case "eight":
                return true;
            default:
                return false;
            }
        default:
            return false;
        }
    }
}

class CharSwitchChecker implements Checker {
    @Override
    public boolean contains(String s) {
        final int length = s.length();
        if (length < 3 || length > 5)
            return false;

        switch (s.charAt(0)) {
        case 'z':
            return "zero".equals(s);
        case 'o':
            return "one".equals(s);
        case 't':
            return s.charAt(1) == 'w' ? "two".equals(s) : "three".equals(s);
        case 'f':
            return s.charAt(1) == 'o' ? "four".equals(s) : "five".equals(s);
        case 's':
            return s.charAt(1) == 'i' ? "six".equals(s) : "seven".equals(s);
        case 'e':
            return "eight".equals(s);
        case 'n':
            return "nine".equals(s);
        }
        return false;
    }
}

public class CheckerTester {
    private static final String[] TESTS = { "zero", "one", "two", "three", "four", "five", "six", "seven",
            "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen",
            "seventeen", "eighteen", "nineteen",

            new String("zero"), new String("one"), new String("two"), new String("three"),
            new String("four"), new String("five"), new String("six"), new String("seven"),
            new String("eight"), new String("nine"), new String("ten"), new String("eleven"),
            new String("twelve"), new String("thirteen"), new String("fourteen"), new String("fifteen"),
            new String("sixteen"), new String("seventeen"), new String("eighteen"), new String("nineteen") };

    public static void test(Checker checker) {
        final int N = 1_000_000;

        long start = System.nanoTime();

        for (int i = 0; i < N; i++)
            for (String test : TESTS)
                checker.contains(test);

        long end = System.nanoTime();

        System.out.printf("%s: %d ms\n", checker.getClass().getName(), (end - start) / 1_000_000);
    }

    public static void main(String args[]) {
        for (int i = 1; i <= 2; i++) {
            System.out.println("---- Check #" + i);
            test(new HashSetChecker());
            test(new HashSetChecker2());
            test(new HashMapChecker());
            test(new TreeSetChecker());
            test(new OrChecker());
            test(new RefOrChecker());
            test(new SwitchChecker());
            test(new EmbeddedSwitchChecker());
            test(new CharSwitchChecker());
        }
    }

}

      

+4


source


The fastest solution can be based on perfect hashing . Find a quick hash function that maps all of your strings to individual integers in a small range and create a hash table. The hash table is fast as there are no collisions by definition. Finding the perfect hash function can be time-consuming, and finding a quick one can be even more difficult.

One example where it works really well is Guava CharMatcher.WHITESPACE

, where all whitespace is mapped to the set {0, ..., 31}

using multiplication and shift (see here for some explanation). the previous implementation also used perfect hashing, but was slower due to division.



Finding a fast perfect hash for your 10 rows and size 16 table should be pretty easy.

+2


source


One improvement on @icza's great suggestions is the state machine. Here's an implementation that looks like this can be quite efficient. Maybe @icza will include it in his timing tests and we'll see how it works.

Essentially, it creates a static tree structure that can go through one step per test line character. If at any moment the required step is not in the tree, we can signal an incorrect match. If we get to the end of the line, then we check to see if the ending is legal.

It should be an algorithm O(k)

(if it were), since its execution time is linear with respect to the length of the input string, but there is clearly some tuning time.

public class Test {

    interface Checker {

        Set<String> VALUES = new HashSet<>(Arrays.asList("zero", "one", "two", "three", "four", "five", "six",
                "seven", "eight", "nine"));

        boolean contains(String s);
    }

    public static class HatChecker implements Checker {

        // Can't think of a name.
        static class Hats {

            // All possible children.
            Hats[] hats = new Hats[256];
            // Are we at the end of a word.
            boolean end = false;
        }

        // Root hats - contains one entry fr each possible fisrt characetr.
        static Hats root = new Hats();

        /**
         * Where should it go?
         */
        private static Hats find(String s, boolean grow) {
            Hats hats = root;
            for (int i = 0; i < s.length(); i++) {
                int ch = s.charAt(i);
                Hats newHats = hats.hats[ch];
                // Not seen this sequence yet?
                if (newHats == null) {
                    if (grow) {
                        // Allowed to grow.
                        newHats = hats.hats[ch] = new Hats();
                    } else {
                        // No growing - stop here.
                        return null;
                    }
                }
                hats = newHats;
            }
            return hats;
        }

        /**
         * Add to the structures.
         */
        private static void add(String s) {
            // Grow it and margk it good.
            find(s, true).end = true;
        }

        static {
            // Grow my structure.
            for (String s : VALUES) {
                add(s);
            }
        }

        @Override
        public boolean contains(String s) {
            // Find where it should be but don't grow.
            Hats found = find(s, false);
            // It a match if it wa sthere and was an end.
            return found != null && found.end;
        }

    }

    private static class Check {

        private final String s;
        private final boolean matches;

        public Check(String s) {
            this.s = s;
            this.matches = Checker.VALUES.contains(s);
        }

        public String toString() {
            return "(" + s + ")=" + matches;
        }
    }
    private static final Check[] TESTS = {
        new Check("zero"),
        new Check("one"),
        new Check("two"),
        new Check("three"),
        new Check("four"),
        new Check("five"),
        new Check("six"),
        new Check("seven"),
        new Check("eight"),
        new Check("nine"),
        new Check("ten"),
        new Check("eleven"),
        new Check("twelve"),
        new Check("thirteen"),
        new Check("fourteen"),
        new Check("fifteen"),
        new Check("sixteen"),
        new Check("seventeen"),
        new Check("eighteen"),
        new Check("nineteen"),
        new Check(new String("zero")),
        new Check(new String("one")),
        new Check(new String("two")),
        new Check(new String("three")),
        new Check(new String("four")),
        new Check(new String("five")),
        new Check(new String("six")),
        new Check(new String("seven")),
        new Check(new String("eight")),
        new Check(new String("nine")),
        new Check(new String("ten")),
        new Check(new String("eleven")),
        new Check(new String("twelve")),
        new Check(new String("thirteen")),
        new Check(new String("fourteen")),
        new Check(new String("fifteen")),
        new Check(new String("sixteen")),
        new Check(new String("seventeen")),
        new Check(new String("eighteen")),
        new Check(new String("nineteen"))};

    public void timeTest(Checker checker) {
        System.out.println("Time");
        final int N = 1_000_000;

        long start = System.nanoTime();

        for (int i = 0; i < N; i++) {
            for (Check check : TESTS) {
                checker.contains(check.s);
            }
        }

        long end = System.nanoTime();

        System.out.printf("%s: %d ms\n", checker.getClass().getName(),
                (end - start) / 1_000_000);
    }

    public void checkerTest(Checker checker) {
        System.out.println("Checker");
        for (Check check : TESTS) {
            if (checker.contains(check.s) != check.matches) {
                System.err.println("Check(" + check + ") failed");
            }
        }
    }

    public static void main(String args[]) {
        try {
            Checker checker = new HatChecker();
            Test test = new Test();
            test.checkerTest(checker);
            test.timeTest(checker);
        } catch (Throwable ex) {
            ex.printStackTrace(System.err);
        }
    }
}

      

I suspect this will be on par with the operator case

. It should be interesting.

Sorry for the name Hat

- I just couldn't name anything good.

+1


source







All Articles