Postgres query optimization

2.6k Views Asked by At

On postgres 9.0, set both index_scan and seq_scan to Off. Why does it improve query performance by 2x?

2

There are 2 best solutions below

0
On

Why ?

The most logical answer is because of the way your database tables are configured.

Without you posting your table schema's I can only hazard a guess that your indices don't have a high cardinality.

that is to say, that if your index contains too much information to be useful then it will be far less efficient, or indeed slower.

Cardinality is a measure of how unique a row in your index is. The lower the cardinality, the slower your query will be.

A perfect example is having a boolean field in your index; perhaps you have a Contacts table in your database and it has a boolean column that records true or false depending on whether the customer would like to be contacted by a third party.

In the mean, if you did 'select * from Contacts where OptIn = true'; you can imagine that you'd return a lot of Contacts; imagine 50% of contacts in our case.

Now if you add this 'Optin' column to an index on that same table; it stands to reason that no matter how fine the other selectors are, you will always return 50% of the table, because of the value of 'OptIn'.

This is a perfect example of low cardinality; it will be slow because any query involving that index will have to select 50% of the rows in the table; to then be able to apply further WHERE filters to reduce the dataset again.

Long story short; If your Indices include bad fields or simply represent every column in the table; then the SQL engine has to resort to testing row-by-agonizing-row.

Anyway, the above is theoretical in your case; but it is a known common reason for why queries suddenly start taking much longer.

Please fill in the gaps regarding your data structure, index definitions and the actual query that is really slow!

1
On

This may help some queries run faster, but is almost certain to make other queries slower. It's interesting information for diagnostic purposes, but a bad idea for a long-term "solution".

PostgreSQL uses a cost-based optimizer, which looks at the costs of all possible plans based on statistics gathered by scanning your tables (normally by autovacuum) and costing factors. If it's not choosing the fastest plan, it is usually because your costing factors don't accurately model actual costs for your environment, statistics are not up-to-date, or statistics are not fine-grained enough.

After turning index_scan and seq_scan back on:

  • I have generally found the cpu_tuple_cost default to be too low; I have often seen better plans chosen by setting that to 0.03 instead of the default 0.01; and I've never seen that override cause problems.

  • If the active portion of your database fits in RAM, try reducing both seq_page_cost and random_page_cost to 0.1.

  • Be sure to set effective_cache_size to the sum of shared_buffers and whatever your OS is showing as cached.

  • Never disable autovacuum. You might want to adjust parameters, but do that very carefully, with small incremental changes and subsequent monitoring.

  • You may need to occasionally run explicit VACUUM ANALYZE or ANALYZE commands, especially for temporary tables or tables which have just had a lot of modifications and are about to be used in queries.

  • You might want to increase default_statistics_target, from_collapse_limit, join_collapse_limit, or some geqo settings; but it's hard to tell whether those are appropriate without a lot more detail than you've given so far.

You can try out a query with different costing factors set on a single connection. When you confirm a configuration which works well for your whole mix (i.e., it accurately models costs in your environment), you should make the updates in your postgresql.conf file.

If you want more targeted help, please show the structure of the tables, the query itself, and the results of running EXPLAIN ANALYZE for the query. A description of your OS and hardware helps a lot, too, along with your PostgreSQL configuration.