How can Array # sort in Ruby so fast?

C # has BitArray, C has bit fields. I couldn't find an equivalent in Ruby core. Google showed me a class BitField

that Peter Cooper wrote for him.

I was reading Jon Bentley Programming Pearls and while trying one of the first examples that deals with sorting a BitMap, I need a type that is an array of bits. I used Peter class

class BitMapSort
  def self.sort(list, value_range_max)
    bitmap = BitField.new(value_range_max)

    list.each{|x| bitmap[x] = 1 }

    sorted_list = []
    0.upto(value_range_max-1){ |number|
      sorted_list << number if (bitmap[number] == 1)
    }
    sorted_list
  end
end

      

Running this set of 1M unique numbers in the range [0, 10,000,000] gave interesting results,

                         user     system      total        real
bitmap               11.078000   0.015000  11.093000 ( 11.156250)
ruby-system-sort      0.531000   0.000000   0.531000 (  0.531250)
quick-sort           21.562000   0.000000  21.562000 ( 21.625000)

Benchmark.bm(20){|x|
    x.report("bitmap"){ ret = BitMapSort.sort(series, 10_000_000);}
    x.report("ruby-system-sort"){ ret = series.sort; }
    x.report("quick-sort"){ ret = QuickSort.sort( series, 0, series.length-1); }
    }

      

How is the default sort for ruby ​​by default 22x faster than 1M bitsField.set + 1 for a vector of 10M bits? Is there a more efficient bit / array field in Ruby? How does Ruby's default sort achieve this level of performance .. does it jump to C to do this?

+2


source to share


5 answers


Array # sort is implemented in C, see. rb_ary_sort

In array.c



It also has some checks to compare Fixnums, so sorting an array of integers doesn't even need to look up methods.

+10


source


How does Ruby's default sort achieve this performance level .. does it jump to C to do it?



All the main classes and methods in the default ruby ​​implementation are implemented in C.

+7


source


The reason this is much faster is probably because it is implemented in the C ruby ​​implementation.

+2


source


I think the real problem is that you are doing 10M comparisons, 10M fetching arrays, 10M from a lot of things, whereas a properly optimized sort routine does a lot less operations since it works on a fixed set of 1M elements.

Basic operations such as sorting are highly optimized in the Ruby VM and are difficult to work around with a pure-Ruby alternative.

+2


source


Jumps to C are definitely correct. Arrays and hashes have C implementations of many methods to improve performance. Integer and float literals also have some tricky code optimizations. When you convert them to bitmap, you lose this optimization as well.

With a compiled language like C or Java, it really makes sense to look for complex optimization patterns. with interpreted languages, the cost of interpreting each command makes this counter productive.

+2


source







All Articles