Querying/importing millions of rows in R from MySQL

45 Views Asked by At

I am having performance issues. I have an indexed table of 3 millions rows, which will eventually get to 15 millions in MySQL.

When I run a simple query from R through dbi/RMySQL

Select * from sales where transaction_year = 2022;

So transaction_year is indexed among other columns. The table has 60 columns. it takes me a minute. My RAM is 16 gb with a i7 processor.

Are there ways to significantly reduce the downloading time to less than 10 seconds? Thank you.

1

There are 1 best solutions below

0
Rick James On

We really need to see SHOW CREATE TABLE and the important queries. I'll assume you have

 CREATE TABLE stuff (
     id INT UNSIGNED NOT NULL  AUTO_INCREMENT,
     transaction_year YEAR NOT NULL,
     ...
     PRIMARY KEY(id)
 ) ENGINE=InnoDB;

In that case, change the PK to these:

   PRIMARY KEY(transaction_year, id),  -- to significantly speed up that  query
   INDEX(id)    -- just to keep auto_inc happy

I don't know if this will hurt the performance of other queries; I really need to see more info.

If you really have transaction_date DATE not a year by itself, then

where transaction_year = 2022

needs to be

where transaction_year >= "2022-01-01"
  AND transaction_year  < "2022-01-01" + INTERVAL 1 YEAR

(And my indexing advice still applies.)

If you expect to purge old data after some number of years, then we can talk about Partitioning. See Partition