currently I am working with MSSQL server (either 2008R2 or 2012) and I'd like to ask if multiple DML statements put in one procedure can cause significant performance hit (drop)?
As I was told, the rule of thumb here is to put each DML statement into separate stored procedure and call them from one wrapper procedure.
According to what I've been told, the thing making the performance worse is a fact, that when multiple DMLs are in one procedure, no good execution plan for that one can be created and stored. On the other hand, when DMLs are separated in own procedures, the propriate execution plan is crated for each one.
Last thing worth mentioning is, that I am going to wrap all DMLs in one explicit transaction - no matter if in one procedure or in separate ones.
Are multiple DML statements in one procedure really that much of a problem ?
Thanks.
It was good that you asked because what you were told is not true. Multiple DML statements, whether in a single stored procedure or spread out, only affect each other as a group when explicitly grouped by an explicit BEGIN TRAN and COMMIT / ROLLBACK. You should put the logic where it makes the most sense to put it (meaning, what it is most understandable, readable, maintainable).
Besides, what is really the difference conceptually between having them in a single proc vs multiple procs if a single proc is going to call the multiple procs anyway? Wouldn't whatever theoretical property of stored procedures that this person thought made things worse when putting multiple DML statements into a single proc also hold true when a single proc called multiple procs that each had DML statements? In the end, you still have a single point of entry responsible for multiple DML calls.
For example:
is two separate transactions given that each query, by itself, is a transaction (unless grouped within an explicit transaction). So then:
is a single transaction. Having multiple DML statements in a transaction can have performance issue due to concurrency (i.e. blocking / deadlocks), but if the operation requires that both DML statements are conceptually an atomic unit of work, well, then they need to be grouped together in an explicit transaction.
Now, given that first example above with the two DML statements and no explicit transaction, assuming those were in a stored procedure, that would be no different than:
And, given the second example where both DML statements were grouped in an explicit transaction, assuming those were in a stored procedure, that would be no different than (using the same two SubProc* procs as noted above):
EDIT:
Some additional info is that apparently the reason why multiple DML statements in a single stored procedure are worse for performance is due to (supposedly) SQL Server not being able to get a "proper" (i.e. good?) execution plan for the combination, but if the DML statements are separated then SQL Server can generate "appropriate" (i.e. good?) execution plans for each separate stored procedure.
Well, execution plans are really at the query level. Sure, prior to SQL Server 2000 it was only possible to recompile an entire procedure, but starting in SQL Server 2005 it became possible to do statement-level recompiles. But either way, each query is optimized on its own and so separating them out doesn't seem to provide any benefit. And in fact, given that local variables (including table variables) can't be shared across stored procedures, it can certainly complicate what you are trying to accomplish in a particular stored procedure if you had to pass every variable along as an input param and pass one or more of them back. That code would be difficult to maintain and debug.
EDIT 2:
Here are some resources with information about execution plan caching and recompiling:
(this is as of SQL Server 2008 R2; I am trying to find something more recent)
(current doc; please note that starting in SQL Server 2008 it become possible to remove a single plan via
[ ({ plan_handle | sql_handle | pool_name }) ]
)(this appears to be written as of SQL Server 2005 given that the last sentence is "Note that there is no way to force SQL Server to remove just a single query plan from cache." which, as noted above, was no longer true as of SQL Server 2008).
Please also note that these documents do indeed refer to execution plans being cached at the object (i.e. stored procedure, etc) level. However, that does not imply that the DML statements are optimized as a whole and would be different if separated out. The only real benefit of separating out the DML statements to separate stored procedures would be that you can (starting in SQL Server 2008) manually issue recompile requests on them individually via
DBCC FREEPROCCACHE ({ plan_handle | sql_handle })
. But if this is something that is only needed occasionally, then it wouldn't really hurt to just recompile the whole stored procedure, and that is probably less of a cost as compared to a clunky separation of code. And, if it is something that is needed frequently, then the DML statement itself can have anOPTION (RECOMPILE)
added to it that won't affect the other queries in that stored procedure.EDIT 3: (hopefully final :)
Let's not forget that this is still Computer Science that we are doing and pretty much everything is testable. It should be fairly easy to set up a scenario with a table and a few procs (like the examples shown above) and to run both the "master" proc that calls sub-stored procedures as well as the all-in-one proc. You can either view the "Actual Execution Plan" in SSMS as you run them or pull the plans from the plan cache (as shown in the linked resources above). So technically, nobody needs to rely upon this mystery person who advised to use separate stored procedures or even me who believes that idea is at best obsolete and at worst pure myth; SQL Server will tell you exactly what it will do in any given situation :).