MS SQL PATINDEX and Regex

Question

MS SQL PATINDEX and Regex

137 Views Asked by Wingman1487 At 05 January 2024 at 01:24

Been banging my head at this for a few trying to come up with a clever way to regex (or if someone has a better idea) a size value from a very random string. I've setup the below test to show what I'm working with, I'm trying to get the Final output in this statement to be like these in Green, and not these in Red by using the @pattern in the query to pull out the...

Number x Number

Hoping there's a regex or string guru out there that might have an idea for me :)

DECLARE @pattern AS VARCHAR(100)
SET @pattern = '%[0-9][0-9. x][0-9.x ][0-9. x]%'

BEGIN

    WITH cte AS (
        SELECT 'Italy Terrazzo Sacra Nero 24x24 Honed' [Name]
        UNION
        SELECT 'Nero Marquina 1x3 Herringbone' [Name]
        UNION
        SELECT 'Myorka Blue 2x8' [Name]
        UNION
        SELECT 'Chrysler Driftwood Hickory Sea 12mil Wear Layer Glue Down 6.3x48.4' [Name]
        UNION
        SELECT 'Myorka Blue 2x8' [Name]
        UNION
        SELECT 'Nero Marquina 1x3 Herringbone' [Name]
        UNION
        SELECT 'Broadway Lvt Ash 9x72 Rigid Core Click- 6.0mm/28mil Wear Layer' [Name]
        UNION
        SELECT 'Minetta Concreto Pearl 2.5mm/28mil Wear Layer Glue Down 18x36' [Name]
        UNION
        SELECT 'Speak Leather Black 24 x 24' [Name]
        UNION
        SELECT 'Accent Montana White 12 x 36 Glossy' [Name]
    )


    SELECT 
         --FULL NAME--
          [Name]
         
         --HELPERS TO SEE WHAT I'M TESTING--
         ,PATINDEX('%[0-9]%x%', [Name]) AS [START]
         ,SUBSTRING([Name], PATINDEX('%[0-9]%x%', [Name]), LEN([Name]))
         ,REVERSE([Name]) AS [REVERSE]
         ,LEN([Name]) AS [STRLenght]
         ,PATINDEX(@pattern, REVERSE([Name])) + 1
         ,LEN([Name]) - PATINDEX('%[0-9][^A-z]x%', REVERSE([Name])) + 1 AS [END]

         --FULL CALCULATION FOR FINAL OUTPUT--
         ,CASE WHEN [Name] LIKE '%[0-9] x [0-9]%'
               THEN SUBSTRING([Name], PATINDEX('%[0-9]%x%', [Name]), (LEN([Name]) - PATINDEX('%[0-9]%x%', REVERSE([Name])) + 1) - PATINDEX('%[0-9]%x%', [Name]) + 1)
               WHEN [Name] LIKE '%[0-9]x[0-9]%'
               THEN REPLACE(SUBSTRING([Name], PATINDEX(@pattern, [Name]), (LEN([Name]) - PATINDEX(@pattern, REVERSE([Name])) + 1) - PATINDEX(@pattern, [Name]) + 1), 'x', ' x ')
               ELSE NULL
            END AS [Final]
          FROM cte
END

--EDIT--

I'm still trying to make sense of the query from Yitzhak has suggested. There are still a few outliers I'm trying to attack by changing your

    ,c.query('
        for $x in /root/r[lower-case(text()[1])="x"]
        let $pos := count(root/r[. << $x]) + 1
        let $before := /root/r[$pos - 1]
           ,$twobefore := /root/r[$pos - 2]
           ,$after := /root/r[$pos + 1]

        return
            if (xs:decimal($before[1]) instance of xs:decimal and
                xs:decimal($after[1]) instance of xs:decimal and
                xs:string($twobefore[1]) instance of xs:string)
                    then data(($before, $x, $after))
            else if (xs:decimal($before[1]) instance of xs:decimal and
                     xs:decimal($after[1]) instance of xs:decimal and
                     xs:decimal($twobefore[1]) instance of xs:decimal)
                        then data(($twobefore, $before, $x, $after))
            else()
    ').value('text()[1]', 'VARCHAR(20)') AS result

This was in an attempt to attack this outlier case but doesn't seem to be getting the $twobefore which what I thought would get the 1 instead of missing it.

Baroque Crackled 1 3/4 X 6 Chair Rail Blanco

Original Q&A

There are 1 best solutions below

**Yitzhak Khabinsky** · Accepted Answer · 2024-01-05T02:27:28.543000

Please try the following solution based on tokenization.

It is leveraging SQL Server built-in XML and XQuery functionality.

Notable points:

CROSS APPLY is tokenizing input string as XML.
XQuery FLWOR expression traversing XML and filtering out any token that is not 'x' via the XPath predicate [lower-case(text()[1])="x"].
/root/r[$pos - 1] and /root/r[$pos + 1] XPath predicates get preceding and following tokens.
where clause is checking for a decimal data type for at least one of the surrounding values.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (id INT IDENTITY PRIMARY KEY, tokens VARCHAR(256));
INSERT @tbl (tokens) VALUES
('Italy Terrazzo Sacra Nero 24x24 Honed'),
('Nero Marquina 1x3 Herringbone'),
('Myorka Blue 2x8'),
('Chrysler Driftwood Hickory Sea 12mil Wear Layer Glue Down 6.3x48.4'),
('Nero Marquina 1x3 Herringbone'),
('Broadway Lvt Ash 9x72 Rigid Core Click- 6.0mm/28mil Wear Layer'),
('Minetta Concreto Pearl 2.5mm/28mil Wear Layer Glue Down 18x36'),
('Speak Leather Black 24 x 24'),
('Accent Montana White 12 x 36 Glossy'),
('Baroque Crackled 1/2 X 6 Pencil Capri'),
('Banza Green Cielo 8" Hex'),
('Masia Torello Nero Brillo 3/4x12 Pencil'),
('Baroque Crackled 1 3/4 X 6 Chair Rail Blanco');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = SPACE(1);

SELECT t.*
    , c.query('
        for $x in /root/r[lower-case(text()[1])="x"]
        let $pos := count(root/r[. << $x]) + 1
        let $before2 := /root/r[$pos - 2],
            $before := /root/r[$pos - 1],
            $after := /root/r[$pos + 1]
        where xs:decimal($before[1]) instance of xs:decimal 
            or xs:decimal($after[1]) instance of xs:decimal 
        return data((if (xs:decimal($before2[1]) instance of xs:decimal) then $before2 else ()
            , $before, $x, $after))
    ').value('text()[1]', 'VARCHAR(20)') AS result
FROM @tbl AS t
    CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
        REPLACE(REPLACE(REPLACE(tokens,'x', ' x '),SPACE(2),SPACE(1)), @separator, ']]></r><r><![CDATA[') + 
        ']]></r></root>' AS XML)) AS t1(c)
ORDER BY id;

Output

id	tokens	result
1	Italy Terrazzo Sacra Nero 24x24 Honed	24 x 24
2	Nero Marquina 1x3 Herringbone	1 x 3
3	Myorka Blue 2x8	2 x 8
4	Chrysler Driftwood Hickory Sea 12mil Wear Layer Glue Down 6.3x48.4	6.3 x 48.4
5	Nero Marquina 1x3 Herringbone	1 x 3
6	Broadway Lvt Ash 9x72 Rigid Core Click- 6.0mm/28mil Wear Layer	9 x 72
7	Minetta Concreto Pearl 2.5mm/28mil Wear Layer Glue Down 18x36	18 x 36
8	Speak Leather Black 24 x 24	24 x 24
9	Accent Montana White 12 x 36 Glossy	12 x 36
10	Baroque Crackled 1/2 X 6 Pencil Capri	1/2 x 6
11	Banza Green Cielo 8" Hex	NULL
12	Masia Torello Nero Brillo 3/4x12 Pencil	3/4 x 12
13	Baroque Crackled 1 3/4 X 6 Chair Rail Blanco	1 3/4 x 6

MS SQL PATINDEX and Regex

--EDIT--

There are 1 best solutions below

Related Questions in SQL

Related Questions in SQL-SERVER

Related Questions in PATINDEX

Trending Questions

Popular # Hahtags

Popular Questions