PostgreSQL join fetch all rows in table, too slow

2.1k Views Asked by At

i have the two tables "commissions" and "mt4_trades". In "mt4_trades" "ticket" column is private key, in "commissions" there is "order_id" and it is has relation to mt4_trades.ticket as one to many (one "ticket" to many "order_id"). And i have this statement:

SELECT commissions.ibs_account AS ibs_account                      
       FROM "public"."mt4_trades" 
    INNER JOIN commissions ON commissions.order_id = mt4_trades.ticket 
        WHERE "mt4_trades"."close_time" >= '2014.11.01' 
          AND "mt4_trades"."close_time" < '2014.12.01'

commissions table constains about 4 millions rows. This statement return 480000 rows. But it is too slow: executions time 9 sec. I did EXPLAIN ANALYZE:

Hash Join  (cost=43397.07..216259.97 rows=144233 width=7) (actual time=3993.839..9459.896 rows=488131 loops=1)
  Hash Cond: (commissions.order_id = mt4_trades.ticket)
  ->  Seq Scan on commissions  (cost=0.00..116452.08 rows=3997708 width=15) (actual time=0.005..4185.254 rows=3997157 loops=1)
  ->  Hash  (cost=42485.10..42485.10 rows=72958 width=4) (actual time=288.767..288.767 rows=97260 loops=1)
        Buckets: 8192  Batches: 1  Memory Usage: 3420kB
        ->  Index Scan using "INDEX_CLOSETIME" on mt4_trades  (cost=0.43..42485.10 rows=72958 width=4) (actual time=0.020..174.810 rows=97260 loops=1)
              Index Cond: ((close_time >= '2014-11-01 00:00:00'::timestamp without time zone) AND (close_time < '2014-12-01 00:00:00'::timestamp without time zone))
Total runtime: 9881.979 ms

This row:

->  Seq Scan on commissions  (cost=0.00..116452.08 rows=3997708 width=15) (actual time=0.005..4185.254 rows=3997157 loops=1)

Means that scan the whole "commissions" table instead compare "order_id" and "ticket" first. Can you help me how i can improve this query. THanks

1

There are 1 best solutions below

0
On BEST ANSWER

9 seconds to return half a million rows is not terrible, and a sequential scan on 4M could be much faster than faster than 100K indexed lookups on 4M. Assuming you've got an index on order_id, you can test this by running set enable_seqscan TO false; before running your query (this will only affect the current connection).

Do you really need all 500K rows every time you run this query? Or are you going to be filtering the results? If you're almost always going to be filtering the results in some other way, you probably want to optimize against that query rather than the one that returns all 500K rows.