Tokumx VS mongodb read performance

I've been reading performance benchmarks comparing Tokumx and pure Mongodb.

Both Tokumksa and Mongodba worked on the same machine.

Equipment overview:

Model Name: Mac mini
Model Identifier: Macmini6,1
Processor Name: Intel Core i5
Processor Speed: 2.5 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 3 MB
Memory: 10 GB

      

There is only one collection per instance. Each collection has 100,000 records.

For tokumx, it was created as a partitioned set. But for mongodb it was created like a normal collection:

db.createCollection("sample", {partitioned: true, primaryKey:  {field1:1, _id: 1}});

      

And for both instances, the index looks like this:

db.sample.ensureIndex({field1:1});
db.sample.ensureIndex({field2:1});
db.sample.ensureIndex({field3:1});
db.sample.ensureIndex({field4:1});
db.sample.ensureIndex({geo:"2d"});
db.sample.ensureIndex({"created_at":1});

      

I used Tsung to do stress testing. In terms of testing, I did a simple search, look field2

and geo

order of the fields by created_at

desc.

<clients>
<client host="localhost" use_controller_vm="false" maxusers="8000"/>
</clients>
<servers>
<server host="jchimac.thenetcircle.lab" port="8080" type="tcp"/>
</servers>
<load duration="5" unit="minute">
<arrivalphase phase="1" duration="5" unit="minute">
<users interarrival="0.03" unit="second"/>
</arrivalphase>
</load>

      

According to the official document, the transaction should be like TOKUMX ™ BENCHMARK VS. MONGODB - HDD

enter image description here

But in my testing:

TOKUMX:

enter image description here

enter image description here

MongoDB:

enter image description here

enter image description here

I am asking here to know if anyone can give any hint on this? Am I missing something in all testing?


Update:

I did another round of testing on Linux (CentOS):

CentOS release 6.5 (Final)
2.6.32-504.1.3.el6.x86_64 GNU/Linux
MemTotal:       24589896 kB
CPU: 12* (Intel(R) Xeon(R) CPU E5645  @ 2.40GHz)

      

Sample data looks like this:

{
  "_id": ObjectId("54867dc8ffbc15aa2bc3ee0e"),
  "_iid": 15,
  "_pid": 15,
  "uid": 102296,
  "nickname": "nickname_102296",
  "gender": 3,
  "image_id": 15,
  "created_at": 1418100168,
  "tag": 1,
  "geo": {
    "lat": 51.590449999999997033,
    "lon": 6.9671900000000004383
  }
}

      

Each collection contains 1,000,000 records.

Indexes in each collection (regular collections are created):

db.createCollection("coll", {primaryKey:  {_pid:1, _id: 1}});
db.tokumx_coll.ensureIndex({gender:1}); 
db.tokumx_coll.ensureIndex({uid:1}); 
db.tokumx_coll.ensureIndex({geo:"2d"}); 
db.tokumx_coll.ensureIndex({_pid:1}); 
db.tokumx_coll.ensureIndex({_iid:1}); 
db.tokumx_coll.ensureIndex({"created_at":1}); 

      

The test plan is also pretty simple:

{'$query', {gender,3,geo, {'$geoWithin', {'$center', [[48.72761, 9.24596], 0.005]}}}, '$orderby',{'_pid',-1}} 

      

Tsung Stress Test runs for 1 hour for each test. And the concurrency is 1 request per second.

  <load>
    <arrivalphase phase="1" duration="60" unit="minute">
      <users interarrival="1" unit="second"/>
    </arrivalphase>
  </load>

      

Here is the report in the screenshot:

TOKUMX:

tokumx summary
tokumx reports

MongoDB:

mongodb summarymongodb reports


Updates @ 2014.12.12 Found: https://github.com/Tokutek/mongo/issues/1014

+3


source to share


3 answers


TokuMX 2.0.0 Community Edition for MongoDB is still built on MongoDB 2.4, which does not have a 2dsphere GEO index yet when I made this post, So if you create Compound Indexes with a GEO index, you will have to wait for the base version on MongoDB 2.6. which maintains the geo 2dshere index.

Basically:



  • "2d indexes": compound indexes with one extra field as a suffix of the 2d index field
  • "2dsphere indexes": composite indexes with scalar index fields (ie ascending or descending) as a prefix or suffix to the 2dsphere index field

And if you are more interested in my stress testing, you can find it in this post .

+3


source


the Sysbench transaction includes insert / update / delete operations, but the test you described is read-only. The main reason TokuMX achieves much higher Sysbench scores than MongoDB is concurrency writes.



+2


source


I'm glad to see you are interested in TokuMX. However, there are a few questions about your benchmarking setup that you must answer before trying to draw conclusions from the results:

  • You are working on a Mac mini. TokuMX is supported for development on OSX only, not production. There are several clear performance issues in OSX that we have resolved on Linux. If you are interested in evaluating TokuMX performance, you really need to test Linux on specialized hardware.

  • The graph you showed in our marketing materials describes how the throughput of a particular test (sysbench) changes as we change the number of concurrent threads. It looks like Tsung doesn't measure bandwidth versus concurrency, so why would you expect it to have similar specs to the graph on our site?

  • Is Tsung's workload similar to your application? How did you choose the circuit that you tested? Does it represent your application data model? Your queries do not match the indexes you selected; if you want to test queries for field2, geo, created_at

    , then you must have an index that orders data according to this key. I expect your application to be more than just a read-only workload that does not use any of the indexes that you define on a small dataset. Think more about how to design a test that will represent your application. Or better yet, just run the application or its trace and follow the labels that interest you.

  • Your test time is only 5 minutes and most of the output shows significant variance during the run. If you are interested in this workload, you probably want to run it much longer (and possibly on a larger dataset), collect a lot of data, and compare both throughput and latency histograms between TokuMX and MongoDB.

  • Why did you create a partitioned collection? Have you created any partitions? Does this paradigm fit the requirements of your application?

I think that if you start addressing these issues, you will behave towards the inconsistencies that you see, and you will hopefully approach a benchmark that will give you reliable and effective results.

+2


source







All Articles