Simplify multiple occurrences of the same formula

71 Views Asked by At

Is it possible to simplify a SEDE query that has the same formula written multiple times?

For instance, this query is writing rtrim(LOWER(Title)) five times:

select
    rtrim(LOWER(p.Title)),
    count(rtrim(LOWER(p.Title)))
from Posts p
group by rtrim(LOWER(p.Title))
having (count(rtrim(LOWER(p.Title))) > 1)
order by count(rtrim(LOWER(p.Title))) desc

In answers, please specify if your factorisation is purely cosmetic or if it also has a performance impact.

2

There are 2 best solutions below

0
On

First - the permanent solution here is to clean up your data. Using functions like LTRIM, RTRIM, UPPER, LOWER makes your not SARGEable. In other words your queries can slow to a crawl because it's impossible for SQL Server to retrieve the data you need from an index without scanning all rows.

For instance, this query is writing rtrim(LOWER(Title)) five times:

Enter the APPLY + VALUES inline aliasing trick

This is something I came up with some time ago at first to simplify my code but I later discovered some occasional performance benefits which I'll demonstrate. First some sample data:

use tempdb;
go

create table dbo.sometable(someid int identity, somevalue decimal(10,2));
insert dbo.sometable(somevalue) values (100),(1050),(5006),(111),(4);

Let's say we have a query that takes a few variables or parameters, performs a calculation on them and uses that value throughout a query. Note the case statement below.

declare @var1 int = 100, @var2 int = 50, @var3 int = 900;

select
  someid, 
    somevalue,
    someCalculation = 
      case when @var3 < somevalue then (@var1 / (@var2*2.00))+@var3 else @var3+somevalue end,
  someRank = dense_rank() over (order by 
      case when @var3 < somevalue then (@var1 / (@var2*2.00))+@var3 else @var3+somevalue end)
from dbo.sometable
where case when @var3 < somevalue then (@var1 / (@var2*2.00))+@var3 else @var3+somevalue end
  between 900 and 2000
order by case when @var3 < somevalue then (@var1 / (@var2*2.00))+@var3 else @var3+somevalue end;

We can simplify this query like this:

select 
  someid, 
  somevalue,
  someCalculation = i.v,
  someRank = dense_rank() over (order by i.v)
from dbo.sometable
cross apply (values
(
  case when @var3 < somevalue then (@var1/(@var2*2.00))+@var3 else @var3+somevalue end)
) i(v)
where i.v between 900 and 2000
order by i.v;

Each query returns identical results. Now the execution plans:

enter image description here

Not only have we simplified our query, we've actually sped it up. In my original query the optimizer had to calculate the same value twice and perform two sorts. Using my inline aliasing trick I was able to remove a sort and a calculation

1
On

Based on the link for your query, it looks like you're using the LOWER() and RTRIM() functions for the sake of comparison.

TSQL by default is case-insensitive, and trailing spaces to the right are ignored. You can get the same results via the following:

Select      Lower(P.Title), Count(P.Title)
From        Posts   P
Group By    Lower(P.Title)
Having      Count(P.Title) > 1
Order By    Count(P.Title) Desc