How to efficiently select a random element from a Scala immutable HashSet
I have scala.collection.immutable.HashSet
one that I want to randomly select from an item.
I could solve the problem with an extension method like this:
implicit class HashSetExtensions[T](h: HashSet[T]) {
def nextRandomElement (): Option[T] = {
val list = h.toList
list match {
case null | Nil => None
case _ => Some (list (Random.nextInt (list.length)))
}
}
}
... but converting to a list will be slow. What would be the most effective solution?
source to share
WARNING This answer is for experimental use only. For a real project, you should probably use your own collection types.
So, I've done some research on the source of the HashSet , and I think there is little opportunity to somehow extract the inner structure of the most valuable class HashTrieSet
without breaking the package.
I came up with this code which expanded on Ben Reich's solution :
package scala.collection
import scala.collection.immutable.HashSet
import scala.util.Random
package object random {
implicit class HashSetRandom[T](set: HashSet[T]) {
def randomElem: Option[T] = set match {
case trie: HashSet.HashTrieSet[T] => {
trie.elems(Random.nextInt(trie.elems.length)).randomElem
}
case _ => Some(set.size) collect {
case size if size > 0 => set.iterator.drop(Random.nextInt(size)).next
}
}
}
}
the file must be created somewhere in the folder src/scala/collection/random
Pay attention to the package scala.collection
- this thing makes the visible part elems
HashTrieSet
visible. This is just a solution that I would think that might work better than O(n)
. The current version should be O(ln(n))
as complex as any of the immutable.HashSet
s operations.
Another warning is that the private structure is HashSet
not part of the standard scala library API, so it can change any version, making this code in error (although it hasn't changed since 2.8)
source to share
Since size
O(1)
on HashSet
, and is iterator
as lazy as possible, I think this solution will be relatively efficient:
implicit class RichHashSet[T](val h: HashSet[T]) extends AnyVal {
def nextRandom: Option[T] = Some(h.size) collect {
case size if size > 0 => h.iterator.drop(Random.nextInt(size)).next
}
}
And if you're trying to get every ounce of efficiency you can use here match
instead of the more succinct Some/collect
idiom used here.
You can look mutable HashSet
to see the method size
. The method defined there iterator
simply calls iterator
on FlatHashTable
. The same basic effectiveness of these techniques applies to immutable HashSet
, if that's what you're working with. For comparison, you can see that the implementation toList
is not HashSet
completely type-hierarchical TraversableOnce
and uses much more primitive elements, which are probably less efficient and (of course) the entire collection must be iterated over to be generated List
. If you are going to convert an entire set to a collection Traversable
, you should use Array
or Vector
, which have a persistent search.
You may also notice that there is nothing special about the methods above HashSet
, and you could enrich Set[T]
instead if you chose that way (although there would be no guarantee that it would be as effective on others Set
, of course).
As a side note, when implementing rich classes for extension methods, you should always consider the implicit user-defined value class when extending AnyVal
. You can read some of the benefits and limitations in the docs and this answer .
source to share