Querying/importing millions of rows in R from MySQL

45 Views Asked by Maad scientist At 22 November 2022 at 14:39

I am having performance issues. I have an indexed table of 3 millions rows, which will eventually get to 15 millions in MySQL.

When I run a simple query from R through dbi/RMySQL

Select * from sales where transaction_year = 2022;

So transaction_year is indexed among other columns. The table has 60 columns. it takes me a minute. My RAM is 16 gb with a i7 processor.

Are there ways to significantly reduce the downloading time to less than 10 seconds? Thank you.

Original Q&A

There are 1 best solutions below

Rick James On 22 November 2022 at 18:23

We really need to see SHOW CREATE TABLE and the important queries. I'll assume you have

 CREATE TABLE stuff (
     id INT UNSIGNED NOT NULL  AUTO_INCREMENT,
     transaction_year YEAR NOT NULL,
     ...
     PRIMARY KEY(id)
 ) ENGINE=InnoDB;

In that case, change the PK to these:

   PRIMARY KEY(transaction_year, id),  -- to significantly speed up that  query
   INDEX(id)    -- just to keep auto_inc happy

I don't know if this will hurt the performance of other queries; I really need to see more info.

If you really have transaction_date DATE not a year by itself, then

where transaction_year = 2022

needs to be

where transaction_year >= "2022-01-01"
  AND transaction_year  < "2022-01-01" + INTERVAL 1 YEAR

(And my indexing advice still applies.)

If you expect to purge old data after some number of years, then we can talk about Partitioning. See Partition

Querying/importing millions of rows in R from MySQL

There are 1 best solutions below

Related Questions in MYSQL

Related Questions in R

Related Questions in BIGDATA

Related Questions in DBI

Related Questions in RMYSQL

Trending Questions

Popular # Hahtags

Popular Questions