Key-value db that serves only reads very fast?

3.7k Views Asked by At

We are doing a project in Scala and I need a key-value database (like a map) that is required to mainly serve read operations and do so really fast:

  • No exotic query support or complex retrieval logic in any ways, just give the key and get the value, just like a map. no conditions, no joins, nothing. Key -> Value
  • The value ,by the way, is itself a map of some list of some strings or something like that. meaning it's a little lengthy (if matters at all)
  • We use it just for reading. No writing expect for the initial populating of the db or some very rare updates or perhaps that can be handled outside of the db ...

I've been directed towards MangoDB and memcachedDB but Mango is good at queries (which adds no value to me) and memcacheDB is all about distribution (not a concern in my project). So far I'm thinking of leveraging a RDBMS (e.g MySQL) but perhaps there are better options in the land of NoSQL ?

7

There are 7 best solutions below

0
On

I would suggest SQLite or Berkeley DB (which has a SQLite-compatible SQL API). Both are simple, embedded database libraries -- they link into your application, so there is no requirement for a separate server. They are both very fast at running queries. Berkeley DB has better scalability for very large databases. If you're interested in using a key-value pair API (NoSQL), Berkeley DB has that API as well.

Good luck in your search.

0
On

MongoDB would probably be an easy solution for this.

http://www.mongodb.org/display/DOCS/Benchmarks

1
On

An alternative would be just to use a flatfile, the data sounds relatively simple and you don't have to write to the file often. Seems that there is also a open source scala implementation of memcached. This access would be very fast. https://github.com/victori/smemcached

0
On

MemcacheDB sounds like the right tool for the job, even if you do not need the distributed networking part (you do not have to do anything not to use it).

Even better, redis is supposed to be very fast and also has native support for storing data structures like lists or sets.

0
On

I would suggest you take a look at Kyoto Cabinet. I'm in the process of writing some Scala wrappers around it, allowing you to access it as a plain old vanilla Scala Map. I haven't done a benchmark myself yet, but according to the benchmarks out there, it's faster than Berkeley DB. (However, it may be to early to tell, since there is no documentation on the overhead of the Java integration.)

Check the JavaDoc APIs here. I have been toying with it on the REPL, and it worked fine.

Here's some proof from the REPL that it works:

$ scala -Djava.library.path=/usr/local/lib
Welcome to Scala version 2.8.0.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_15).
Type in expressions to have them evaluated.
Type :help for more information.

scala> :cp /Users/wilfred/.m2/repository/com/fallabs/kyotocabinet/1.15/kyotocabinet-1.15.jar
Added '/Users/wilfred/.m2/repository/com/fallabs/kyotocabinet/1.15/kyotocabinet-1.15.jar'.  Your new classpath is:
.:/Users/wilfred/.m2/repository/com/fallabs/kyotocabinet/1.15/kyotocabinet-1.15.jar

scala> import kyotocabinet._                                                                
import kyotocabinet._

scala> val db = new DB()                                                                    
db: kyotocabinet.DB = (null): -1: -1

scala> db.open("casket.kch", DB.OWRITER | DB.OCREATE)
res0: Boolean = true

scala> db.set("foo", "bar")
res1: Boolean = true

scala> db.get("foo")
res2: java.lang.String = bar
0
On

I'd recommend CDB (Constant Data Base). It has a few advantages:

  • Fast lookups: A successful lookup in a large database normally takes just two disk accesses. An unsuccessful lookup takes only one.
  • Low overhead: A database uses 2048 bytes, plus 24 bytes per record, plus the space for keys and data.
  • No random limits: cdb can handle any database up to 4 gigabytes. There are no other restrictions; records don't even have to fit into memory. Databases are stored in a machine-independent format.
  • Fast atomic database replacement: cdbmake can rewrite an entire database two orders of magnitude faster than other hashing packages.
  • Fast database dumps: cdbdump prints the contents of a database in cdbmake-compatible format.

The only problem is that it's limited to 4GB database sizes. If you need more data, there's a 64 bit version (in Go cdb64 or in Python python-pure-cdb) that can read database files up to 16 exabytes.

0
On

Chronicle Map is a pure Java embeddable, persistent key-value store.

PalDB is a write-once, embeddable, persistent key-value store for Java