Tokumx VS mongodb read performance

2.4k Views Asked by At

I was doing a read performance stress testing by comparing Tokumx and pure Mongodb.

Both tokumx and mongodb were running in the same machine.

Hardware Overview:

Model Name: Mac mini
Model Identifier: Macmini6,1
Processor Name: Intel Core i5
Processor Speed: 2.5 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 3 MB
Memory: 10 GB

There are only one collection in each instance. There are 100,000 entries in each collection.

For tokumx, it was created as partitioned collection. But for mongodb, it was created as a normal collection:

db.createCollection("sample", {partitioned: true, primaryKey:  {field1:1, _id: 1}});

And for both instance the index looks like following:

db.sample.ensureIndex({field1:1});
db.sample.ensureIndex({field2:1});
db.sample.ensureIndex({field3:1});
db.sample.ensureIndex({field4:1});
db.sample.ensureIndex({geo:"2d"});
db.sample.ensureIndex({"created_at":1});

I was using Tsung to do the stress testing. In test plan, I did a simple search by looking field2 and geo fields order by created_at desc.

<clients>
<client host="localhost" use_controller_vm="false" maxusers="8000"/>
</clients>
<servers>
<server host="jchimac.thenetcircle.lab" port="8080" type="tcp"/>
</servers>
<load duration="5" unit="minute">
<arrivalphase phase="1" duration="5" unit="minute">
<users interarrival="0.03" unit="second"/>
</arrivalphase>
</load>

According to official document, the transaction should be like TOKUMX™ BENCHMARK VS. MONGODB – HDD

enter image description here

But in my testing:

TOKUMX:

enter image description here

enter image description here

MongoDB:

enter image description here

enter image description here

I am asking here to know is anyone can give any hint about this? Did I miss something in the whole testing?


Updates:

I did another round testing on Linux(CentOS) machine:

CentOS release 6.5 (Final)
2.6.32-504.1.3.el6.x86_64 GNU/Linux
MemTotal:       24589896 kB
CPU: 12* (Intel(R) Xeon(R) CPU E5645  @ 2.40GHz)

Sample Data looks like:

{
  "_id": ObjectId("54867dc8ffbc15aa2bc3ee0e"),
  "_iid": 15,
  "_pid": 15,
  "uid": 102296,
  "nickname": "nickname_102296",
  "gender": 3,
  "image_id": 15,
  "created_at": 1418100168,
  "tag": 1,
  "geo": {
    "lat": 51.590449999999997033,
    "lon": 6.9671900000000004383
  }
}

Each collection has 1,000,000 entries.

Indices on each collection(Normal collections are created):

db.createCollection("coll", {primaryKey:  {_pid:1, _id: 1}});
db.tokumx_coll.ensureIndex({gender:1}); 
db.tokumx_coll.ensureIndex({uid:1}); 
db.tokumx_coll.ensureIndex({geo:"2d"}); 
db.tokumx_coll.ensureIndex({_pid:1}); 
db.tokumx_coll.ensureIndex({_iid:1}); 
db.tokumx_coll.ensureIndex({"created_at":1}); 

Test plan is also quite simple:

{'$query', {gender,3,geo, {'$geoWithin', {'$center', [[48.72761, 9.24596], 0.005]}}}, '$orderby',{'_pid',-1}} 

Tsung stress testing running for 1 hour for each testing. And the concurrency is 1 request per second.

  <load>
    <arrivalphase phase="1" duration="60" unit="minute">
      <users interarrival="1" unit="second"/>
    </arrivalphase>
  </load>

Here is the report in screenshot:

TOKUMX:

tokumx summary
tokumx reports

MONGODB:

mongodb summary mongodb reports


Updates @2014.12.12 Found this: https://github.com/Tokutek/mongo/issues/1014

3

There are 3 best solutions below

0
On BEST ANSWER

TokuMX 2.0.0 Community Edition for MongoDB is still built on MongoDB 2.4 which doesn't have GEO 2dsphere index yet when I made this post. So if you are making a Compound Indexes having GEO index, you have to wait for the version base on MongoDB 2.6 which support geo 2dshere index.

Basically:

  • "2d indexes": Compound indexes with only one additional field, as a suffix of the 2d index field
  • "2dsphere indexes": Compound indexes with scalar index fields (i.e. ascending or descending) as a prefix or suffix of the 2dsphere index field

And if you are interested in more my stress testing, you can find it in this post.

2
On

A Sysbench transaction includes insert/update/delete operations, but the test you are describing is read-only. A large reason that TokuMX achieves much higher Sysbench results than MongoDB is write concurrency.

2
On

I'm glad to see you're interested in TokuMX. However, there are a number of questions about your benchmarking setup that you should answer before trying to draw conclusions from the results:

  1. You're running on a Mac mini. TokuMX is supported for development only on OSX, not for production. There are several explicit performance problems on OSX that we have resolved on Linux. If you are interested in evaluating TokuMX's performance, you really should be testing on Linux on dedicated hardware.

  2. The graph you showed from our marketing materials describes how the throughput of a specific benchmark (sysbench) changes as we vary the number of concurrent threads. Tsung doesn't appear to be measuring throughput vs. concurrency, so why are you expecting it to have similar characteristics to the graph on our site?

  3. Is Tsung's workload similar to your application? How did you choose the schema you tested? Does it represent your application's data model? Your queries don't match up with the indexes you chose; if you want to test queries on field2, geo, created_at, then you should have an index that orders data according to that key. I expect your application isn't just a read-only workload that doesn't use any of the indexes you've defined on a small data set. Think more about how to design a benchmark that will represent your application. Or better yet, just run your application or a trace of it, and monitor the metrics you care about.

  4. Your benchmark's running time is only 5 minutes, and most of the output demonstrates significant variance through the run. If this workload is interesting to you, you probably want to run it for a lot longer (and maybe on a larger data set), collect lots of data, and compare both the throughput and the latency histograms between TokuMX and MongoDB.

  5. Why did you create a partitioned collection? Did you create any partitions? Does this paradigm match the requirements of your application?

I think if you start to address these questions you'll lead yourself toward the discrepancies you're seeing, and you will hopefully approach a benchmark that will give you reliable and actionable results.