SQL Server: How can we call the same UDF with different collations

600 Views Asked by At

I'm writing a User Defined Function (UDF) in SQL Server 2008 R2. I'd like to have the UDF use different collations on different calls. Is this possible? Is it possible to dynamically change a collation based upon user input?

From Intellisense, what little I can find online, it seems that collation must be specified in the actual T-SQL code and cannot be applied as a variable or input parameter. Are there ways around this, specifically ways that can work inside a UDF?

EDIT: As requested, here is the UDF I'm working with. My intent is to call it both with and without "xxxx sensitive" settings for sorting a query's results by "relevance".

/*
    T-SQL Implementation of Levenshtein Edit Distance.
*/
CREATE FUNCTION [dbo].[LevenshteinDistanceSQL] (
    @textA varchar(max)
    , @textB varchar(max)
)
RETURNS INT
AS
Begin

    DECLARE
        @aLength int
        , @bLength int
        , @prevSize int
        , @currentSize int
        , @previousDistances varbinary(max)
        , @currentDistances varbinary(max)
        , @result int

    SET @aLength = LEN(@textA)
    SET @bLength = LEN(@textB)
    SET @result = NULL

    -- Degenerate Cases

    -- want calling routine to specify the collation of this comparison
    If (@textA = @textB) SET @result = 0;

    If (@aLength = 0) SET @result = @bLength;
    If (@bLength = 0) SET @result = @aLength;

    If (@result IS NULL)
    Begin
        -- Set sizes of tables to length of largest input text, plus 1
        SET @prevSize = CASE WHEN @aLength > @bLength THEN @aLength ELSE @bLength END + 1;
        SET @currentSize = CASE WHEN @aLength > @bLength THEN @aLength ELSE @bLength END + 1;

        -- Initialize Previous distances
        DECLARE
            @i int
            , @j int
            , @cost int
            , @curDistLastCost int
            , @prevDistCurCost int
            , @prevDistLastCost int
            , @charA char(1)
            , @charB char(1)

        SET @i = 0
        SET @previousDistances = 0x -- empty varbinary
        While (@i < @prevSize)
        Begin
            SET @previousDistances = @previousDistances + CONVERT(binary(4), @i)
            SET @i = @i + 1;
        End

        -- Process @textA
        SET @i = 0
        While (@i < @aLength)
        Begin
            SET @currentDistances = CONVERT(binary(4), @i + 1);

            -- Process @textB
            SET @j = 0
            While (@j < @bLength)
            Begin
                SET @charA = SUBSTRING(@textA, @i + 1, 1)
                SET @charB = SUBSTRING(@textB, @j + 1, 1)

               -- want calling routine to specify the collation of this comparison
                SET @cost = CASE WHEN @charA = @charB THEN 0 ELSE 1 END;

                SET @curDistLastCost = CONVERT(int, SUBSTRING(@currentDistances, @j * 4 + 1, 4))
                SET @prevDistCurCost = CONVERT(int, SUBSTRING(@previousDistances, (@j + 1) * 4 + 1, 4))
                SET @prevDistLastCost = CONVERT(int, SUBSTRING(@previousDistances, @j * 4 + 1, 4))

                SET @currentDistances = @currentDistances + CONVERT(binary(4), CASE
                    WHEN @curDistLastCost < @prevDistCurCost AND @curDistLastCost < @prevDistLastCost THEN @curDistLastCost + 1
                    WHEN @prevDistCurCost < @curDistLastCost AND @prevDistCurCost < @prevDistLastCost THEN @prevDistCurCost + 1
                    ELSE @prevDistLastCost + @cost
                    END)

                SET @j = @j + 1
            End

            -- copy current distances to previous distances for next iteration
            SET @previousDistances = @currentDistances

            SET @i = @i + 1;
        End

        SET @result = CONVERT(int, SUBSTRING(@currentDistances, @bLength * 4 + 1, 4))
    End

    RETURN @result
End

Called like this:

declare @text varchar(50)

set @text = 'c'

select
    *
from
    Skills as s
where
    s.Name like '%' + @text + '%'
order by
    dbo.LevenshteinDistance(@text, s.Name) -- COLLATE Latin1_General_100_CI_AI
    , dbo.LevenshteinDistance(@text, s.Name) -- COLLATE Latin1_General_100_CS_AS_KS_WS
    , s.Name

Being able to specify the collation will allow me to sort the shortest edit-distance (insensitive) to the top of the results; and within an insensitive edit distance the case, accent, etc. matches will be at the top.

1

There are 1 best solutions below

4
On

There are two ways to accomplish this, and both require SQLCLR. As you have seen, T-SQL does not allow for specifying the COLLATION dynamically, but in .Net you can :). And in fact, there are two additional benefits to using SQLCLR for this purpose:

  1. You can get more granular control over the options of doing the comparison. Look at the CompareOptions enum. For example, it has an option for IgnoreSymbols:

    Indicates that the string comparison must ignore symbols, such as white-space characters, punctuation, currency symbols, the percent sign, mathematical symbols, the ampersand, and so on.

  2. SQLCLR UDFs can allow for a parallel execution plan, whereas T-SQL UDFs will always disallow parallel plans. In order to allow for a parallel plan, the SQLCLR UDF needs to be deterministic and marked as IsDeterministic=true as well as not do any data access (no need to mark this explicitly as it is the default).

That said, here are some additional notes:

  • All implementations of the Levenshtein Distance algorithm that I have seen assume an Ordinal comparison (i.e. any of the *_BIN collations). But I suppose for what you are trying to accomplish it could prove to be interesting :)

  • Given what you are trying to accomplish, when taking the SQLCLR approach, you don't actually need to pass in any collation info. The reason is that each particular language (i.e. collation) controls both the comparison options for equality as well as sorting. But sorting is not involved in the Levenshtein Distance algorithm. That reduces the need to just two variations:

    • Fully insensitive (at least for IgnoreCase and IgnoreNonSpace, and optionally for IgnoreKanaType, IgnoreSymbols, and IgnoreWidth)

    • Fully sensitive (i.e. Ordinal)

So, your two options are:

  1. Do this fully in .Net:

    Implement the algorithm fully in C# (or VB.Net) and use the CompareInfo.Compare (String, String, CompareOptions) method. In this scenario you need either two functions, one that is "sensitive" and one that is "insensitive", or a single function for both that takes an input parameter (SqlBoolean, perhaps) for @SensitiveComparison. But the particular language of the strings to be compared is irrelevant so you would technically support all languages on day 1 :).

  2. Keep the algorithm in T-SQL. In this scenario you would execute a SqlCommand using a connection string of "Context Connection = true;". Your function would take in a SqlString of a collation name and use that to concatenate into the algorithm in the two spots that do a comparison. [I don't see much benefit to this option over the fully C# option, and in fact it should be a little slower as .Net code should be faster for this type of thing.]

In either case, it should be noted that this can be accomplished in an assembly marked with PERMISSION_SET = SAFE. Meaning, there is absolutely no security risk in implementing this functionality (as described here). And, for anyone who is still under the (false) impression that SQLCLR / enabling CLR Integration is inherently a security risk, I wrote an article (part of a series on SQLCLR) that covers this topic of security from various angles and with plenty of examples to test for yourself: Stairway to SQLCLR Level 3: Security (General and SAFE Assemblies) (free registration required).