Very fast search for specific characters in Java
this probably seems like a bit of a silly question. And maybe that's the way it is. But I have a function that I use very often and wanted an opinion on whether this is the fastest way to get the job done. The function is used so many times that any increase in speed would be really noticeable. All it does is check if a character is a nucleotide (for example: if a char is "A", "T", "C" or "G".
private static boolean isValidNucleotide(char nucleotide) {
nucleotide = Character.toUpperCase(nucleotide);
if(nucleotide == 'A') return true;
if(nucleotide == 'T') return true;
if(nucleotide == 'C') return true;
if(nucleotide == 'G') return true;
return false;
}
Is this the fastest way to get the job done? Or do you think it is worth implementing some sort of index / map / something else (perhaps doing the comparison outside of the function and just copying that text at a few points in the code)? I am really not an expert in this field in Java.
source to share
The fastest (but the smallest memory is efficient another 255 bytes is not bad!) Would look like this:
/* this is static member of class */
static boolean map[] = new boolean[256];
static {
for(int j = 0; j < map.length; j++)
map[j] = false;
/* map your required values true here */
map['A'] = true;
map['T'] = true;
map['C'] = true;
map['G'] = true;
/* make small letter here too */
map['a'] = true;
map['t'] = true;
map['c'] = true;
map['g'] = true;
}
Then create a function like this:
private static boolean isValidNucleotide(char nucleotide) {
/* complexity is just one access to array */
return map[nucleotide];
}
As @paxdiablo said, there are 2 bytes in java char, not 1 byte, but you are in that range. Just changing return map[nucleotide];
to return map[0x00ff & nucleotide];
should work.
You can also resize the map to 65536
to be safe and avoid any mistakes.boolean map = new boolean[65536]
source to share
Get rid of Character.toUpperCase
and check both capital and small cases, this will speed up your function significantly.
private static boolean isValidNucleotide(char nucleotide) {
if(nucleotide == 'A' || nucleotide == 'a') return true;
// Rest of your conditions
return false;
}
I did a little test with your original function and 10000000 times
it took on average to run 80 ms
, but when I removed Character.toUpperCase()
and explicitly pointed both cases out, it only took 40 ms
what is a significant improvement.
Edit:
Using the solution Map
suggested by @Shivam Kalra only took on average 11 ms
!
source to share