Physical Table being used a Temp Table in a warehosue. Any advantage?

165 Views Asked by At

As I stated in previous posts I've inherited a warehouse based on SQL Server framework.

In my continuous review of existing components and the practices adopted by the predecessors I found something that caught my attention and surprised me: there are several places where physical tables are being used as temporal tables to manipulate large amounts of data. First reaction was to think this practice is very expensive for the DBMS but would like to have more feedback on that.

Some annotations on this topic:

  • Physical tables being created/droped within SPs ( tables called "TMP_TableName")
  • Tables mostly used to manipulate large amount of data
  • Mentioned SPs calls several times during daily night processing

Questions:

  1. Does this practice brings up any advantage to the processing routines that I'm not aware of?
  2. Are there best practices about this?
  3. My plan is to update the code to use #temp tables to improve the performance. Comments on that?.
  4. Should I consider to use variable tables? I read the performance is bad when treating with big data.

I appreciate any feedback based on your knowlege/experience you may want to share with all the stackoverflowers.

Thanks in advance,

2

There are 2 best solutions below

0
On
  1. Does this practice brings up any advantage to the processing routines that I'm not aware of?

You can get performance improvements by dumping specific data into tables and creating specific indexes etc. i.e. if the same large but limited dataset is used multiple times, or if you have a calculated column which requires an index. Often this happens behind the scenes by SQL anyway though, whether through caching/spooling etc.

  1. Are there best practices about this?

Personally I would leave the table intact so I could debug afterwards. If you were to do that you would need code to clear the table beforehand at each run

There is a serious downside with the current approach: If two process run the stored proc at the same time the data will 'collide'

My plan is to update the code to use #temp tables to improve the performance. Comments on that?.

It's unlikely there will be any change in performance unless your DB is on a slow disk and your tempdb is on a fast disk, and you have insufficient RAM so it has to use the disk all the time.

This will fix the problem with the SP being run at the same time with multiple processes though.

Should I consider to use variable tables? I read the performance is bad when treating with big data

Basically no.....fully covered in the prior post.

In summary convert to temp (#) tables unless you have some kind of need to debug the process

0
On

Temp tables (#temp) are user tables just like any other you would create in a database with one important difference. When instantiated, they are given a unique name in the tempdb. This way, if multiple connections call the same procedure, they don't step on each others temp tables.

The best practice for temp tables would generally be to use a #temp. One quick exception to this that comes to mind would be a permanent staging table that you know is only going to be used by one process at a time.

I would either change over to #temp tables or create them as permanent staging tables.

I generally don't use table variables. They are still temp tables and stored in the tempdb but are very limited in the way they can be indexed and manipulated once created. With that said, they can be useful for small amounts of fairly static data. There are some operations that you can only use one vs the other.

For more information on the differences between Temp Tables and Table Variables see here and for an exhaustive explanation see here

Hope that helps!

Clay