About the implementation of CharMatcher.WHITESPACE

When I looked at the implementation CharMatcher

and noticed a box WHITESPACE_MULTIPLIER=1682554634

, then I set that value to 1582554634

by running the test file CharMatcherTest#testWhitespaceBreakingWhitespaceSubset

, of course it failed.

After that I changed testWhitespaceBreakingWhitespaceSubset only to call WHITESPACE.apply((char)c)

without assert, print index in methodWHITESPACE.matches

int index=(WHITESPACE_MULTIPLIER * c) >>> WHITESPACE_SHIFT)

      

finally found that the index collided after changing WHITESPACE_MULTIPLIER

from 1682554634

to1582554634

No doubt 1682554634 is well designed, my question is how can I conclude this "magic number"?

Following Martin Graykar's suggestion , I am trying to write a "magic number generator" like this and it worked:

char[] charsReq = WHITESPACE_TABLE.toCharArray();
Arrays.sort(charsReq);
OUTER:
for (int WHITESPACE_MULTIPLIER_WANTTED = 1682553701; WHITESPACE_MULTIPLIER_WANTTED <= 1682554834; WHITESPACE_MULTIPLIER_WANTTED++) {
    int matchCnt = 0;
    for (int c = 0; c <= Character.MAX_VALUE; c++) {
        int position = Arrays.binarySearch(charsReq, (char) c);
        char index = WHITESPACE_TABLE.charAt((WHITESPACE_MULTIPLIER_WANTTED * c) >>> WHITESPACE_SHIFT);
        if (position >= 0 && index == c) {
                matchCnt++;
        } else if (position < 0 && index != c) {
                matchCnt++;
        } else {
            continue OUTER;
        }
    }
    // all valid
    if ((matchCnt - 1) == (int) (Character.MAX_VALUE)) {
        System.out.println(WHITESPACE_MULTIPLIER_WANTTED);
    }
}

      

if you change the sequence of characters (swap \ u2001 \ u2002 position) in WHITESPACE_TABLE, the algorithms have no solution (changed the loop end condition to Integer.MAX_VALUE).


since the IntMath.gcd implementation refers to http://en.wikipedia.org/wiki/Binary_GCD_algorithm
my question is where can I find implementation stuff CharMatcher.WHITESPACE.match

?

0


source to share


1 answer


I'm not sure if the generator still exists somewhere, but it can be easily recreated. The class Result

contains data used in the implementationCharMatcher.WHITESPACE

:

static class Result {
    private int shift;
    private int multiplier;
    private String table;
}

// No duplicates allowed.
private final String allMatchingString = "\u2002\r\u0085\u200A\u2005\u2000"
        + "\u2029\u000B\u2008\u2003\u205F\u1680"
        + "\u0009\u0020\u2006\u2001\u202F\u00A0\u000C\u2009"
        + "\u2004\u2028\n\u2007\u3000";

public Result generate(String allMatchingString) {
    final char[] allMatching = allMatchingString.toCharArray();
    final char filler = allMatching[allMatching.length - 1];
    final int shift = Integer.numberOfLeadingZeros(allMatching.length);
    final char[] table = new char[1 << (32 - shift)];
    OUTER: for (int i=0; i>=0; ++i) {
        final int multiplier = 123456789 * i; // Jumping a bit makes the search faster.
        Arrays.fill(table, filler);
        for (final char c : allMatching) {
            final int index = (multiplier * c) >>> shift;
            if (table[index] != filler) continue OUTER; // Conflict found.
            table[index] = c;
        }
        return new Result(shift, multiplier, new String(table));
    }
    return null; // No solution exists.
}

      



It generates a different multiplier, but it doesn't matter.

If there is no solution for this allMatchingString

, you can decrease the offset and try again.

+2


source







All Articles