Which Database engine for large dataset

217 Views Asked by John Cadac At 08 October 2017 at 16:52

I'm working on a analysis assignment, we got a partial data-set from the university Library containing almost 300.000.000 rows.

Each row contains:

ID
Date
Owner
Deadline
Checkout_date
Checkin_date

I put all this inside a MySQL table, then I started querying that for my analysis assignment, however simple query (SELECT * FROM table WHERE ID = something) where taking 9-10 minutes to complete. So I created an index for all the columns, which made it noticeable faster ~ 30 sec.

So I started reading similar issues, and people recommended switching to a "Wide column store" or "Search engine" instead of "Relational".

So my question is, what would be the best database engine to use for this data?

Original Q&A

There are 2 best solutions below

dadoonet On 08 October 2017 at 17:02

Using a search engine to search is IMO the best option.

Elasticsearch of course!

It's built to hold whatever type of data, text, numerics, geopoints, shapes and it's built for search and computation. It can easily scale out anytime you need to ingest even more data.

On a previous project (in 2011), I have been able to transform batches which were running for hours during the night to real time insights...

It's very common to see Elasticsearch clusters with some gb or pb of data nowadays...

Disclaimer: I work at elastic. :)

Alain Collins On 08 October 2017 at 19:05

The answer is, of course, "it depends". In your example, you're counting the number of records in the database with a given ID. I find it hard to believe that it would take 30 seconds in MySQL, unless you're on some sluggish laptop.

MySQL has powered an incredible number of systems because it is full-featured, stable, and has pretty good performance. It's bad (or has been bad) at some things, like text search, clustering, etc.

Systems like Elasticsearch are good with globs of text, but still may not be a good fit for your system, depending on usage. From your schema, you have one text field ("owner"), and you wouldn't need Elasticsearch's text searching capabilities on a field like that (who ever needed to stem a user name?). Elasticsearch is also used widely for log files, which also don't need a text engine. It is, however, good with blocks of text and with with clustering.

If this is a class assignment, I'd stick with MySQL.

Which Database engine for large dataset

There are 2 best solutions below

Related Questions in MYSQL

Related Questions in ELASTICSEARCH

Related Questions in RELATIONAL-DATABASE

Related Questions in WIDE-COLUMN-STORE

Trending Questions

Popular # Hahtags

Popular Questions