DBI/SQL performance on SELECT via different methods?

324 Views Asked by At

I've been looking around and haven't been able to find a good answer to this question. Is there a real performance difference between the following DBI methods?

  • fetchrow_arrayref vs
  • selectall_arrayref vs
  • fetchall_arrayref

It probably doesn't matter when making a single SELECT call that will give a smallish (>50 records) resultset, but what about making multiple SELECT statements all in a row? What about if the resultsets are huge (i.e. thousands of records)?

Thanks.

2

There are 2 best solutions below

1
On

The key question to ask yourself is whether you need to keep all the returned rows around in memory.

If you do then you can let the DBI fetch them all for you - it will be faster than writing the equivalent code yourself.

If you don't need to keep all the rows in memory, in other words, if you can process each row in turn, then using fetchrow_arrayref in a loop will typically be much faster.

The reason is that the DBI goes to great lengths to reuse the memory buffers for each row. That can be a significant saving in effort. You can see the effect on this slide, although the examples don't directly match your question. You can also see from that slide the importance of benchmarking. It can be hard to tell where the balance lies.

If you can work on a per-row basis, then binding columns can yield a useful performance gain by reducing the work of accessing the values in the fetched rows.

You also asked about "huge" results. (I doubt "thousands of records" would be a problem on modern machines unless the rows themselves were very 'large'.) Clearly processing a row at a time is preferable for very large result sets. Note that some databases default to streaming all the results to the driver which then buffers them in a compact form and returns the rows one by one as your perl code (or a DBI method) fetches them. Again, benchmark and test for yourself.

If you need more help, the dbi-users mailing list is a good place to ask. You don't need to subscribe.

3
On

The difference between fetchrow* and fetch all will be the location of the loop code in the call stack. That is, the fetchall* implies the fetch loop, while the fetchrow* implies that you will write your own loop.

The difference between the fetch* and select* methods is that one requires you to manually prepare() and execute() the query, while the other does that for you. The time differences will come from how efficient your code is compared to DBI's.

My reading has shown that the main differences between methods are between *_arrayref and *_hashref, where the *_hashref methods are slower due to the need to look up the hash key names in the database's metadata.