I've been looking around and haven't been able to find a good answer to this question. Is there a real performance difference between the following DBI methods?
- fetchrow_arrayref vs
- selectall_arrayref vs
- fetchall_arrayref
It probably doesn't matter when making a single SELECT
call that will give a smallish (>50 records) resultset, but what about making multiple SELECT
statements all in a row? What about if the resultsets are huge (i.e. thousands of records)?
Thanks.
The key question to ask yourself is whether you need to keep all the returned rows around in memory.
If you do then you can let the DBI fetch them all for you - it will be faster than writing the equivalent code yourself.
If you don't need to keep all the rows in memory, in other words, if you can process each row in turn, then using fetchrow_arrayref in a loop will typically be much faster.
The reason is that the DBI goes to great lengths to reuse the memory buffers for each row. That can be a significant saving in effort. You can see the effect on this slide, although the examples don't directly match your question. You can also see from that slide the importance of benchmarking. It can be hard to tell where the balance lies.
If you can work on a per-row basis, then binding columns can yield a useful performance gain by reducing the work of accessing the values in the fetched rows.
You also asked about "huge" results. (I doubt "thousands of records" would be a problem on modern machines unless the rows themselves were very 'large'.) Clearly processing a row at a time is preferable for very large result sets. Note that some databases default to streaming all the results to the driver which then buffers them in a compact form and returns the rows one by one as your perl code (or a DBI method) fetches them. Again, benchmark and test for yourself.
If you need more help, the dbi-users mailing list is a good place to ask. You don't need to subscribe.