I'm working on a analysis assignment, we got a partial data-set from the university Library containing almost 300.000.000 rows.
Each row contains:
- ID
- Date
- Owner
- Deadline
- Checkout_date
- Checkin_date
I put all this inside a MySQL table, then I started querying that for my analysis assignment, however simple query (SELECT * FROM table WHERE ID = something) where taking 9-10 minutes to complete. So I created an index for all the columns, which made it noticeable faster ~ 30 sec.
So I started reading similar issues, and people recommended switching to a "Wide column store" or "Search engine" instead of "Relational".
So my question is, what would be the best database engine to use for this data?
Using a search engine to search is IMO the best option.
Elasticsearch of course!
It's built to hold whatever type of data, text, numerics, geopoints, shapes and it's built for search and computation. It can easily scale out anytime you need to ingest even more data.
On a previous project (in 2011), I have been able to transform batches which were running for hours during the night to real time insights...
It's very common to see Elasticsearch clusters with some gb or pb of data nowadays...
Disclaimer: I work at elastic. :)