SQL: checking duplicates with/without leading zeroes

83 Views Asked by At

I need to find same items duplicated in a database with/without a leading zero. Examples: enter image description here

You can see my query below but it doesn't find the items with the issue:

SELECT A.LOCID, B.LOCID2
FROM (
    SELECT LOCID
    FROM TABLENAME
    WHERE regexp_like( LOCID, '^[[:digit:]]*$') -- LOCID only digital characters
    AND LOCID LIKE '0%'
    )AS A
inner join (
    --SELECT TRIM(LEADING '0' FROM LOCID ) AS LOCID2
    SELECT LOCID AS LOCID2
    FROM TABLENAME
    WHERE LOCID NOT LIKE '0%'
    and regexp_like( LOCID, '^[[:digit:]]*$')
    )  AS B
ON A.LOCID = B.LOCID2; 

3

There are 3 best solutions below

0
Javad M. Amiri On

You can use REGEXP_REPLACE function to remove leading zeros e.g.

SELECT REGEXP_REPLACE('00010011', '^(0*)', '\2');

which yields 10011. So you can join on the replaced ids rather than actual ones.

You can test it with this:

SELECT * FROM UNNEST(ARRAY['001230', '022']) a inner join 
UNNEST(ARRAY['1230', '0220', '222', '00']) b on 
REGEXP_REPLACE(a, '^(0*)', '\2') = REGEXP_REPLACE(b, '^(0*)', '\2');

This gives you:

a,      b
001230, 1230
0
Isolated On

This looks like Oracle, so here is an Oracle solution. Rather than using regexp_replace to look for digits only, you could compare upper() to lower() assuming that letters are the only other values in this column.

Multiple ways to accomplish this, including using a CTE to get two columns.

This should work with many different DBMS'. You may just need to modify substr or length to match your system.

create table TABLENAME (
  LOCID varchar(20)
  );
insert all
 into tablename (LOCID) values ('100123')
 into tablename (LOCID) values ('0100123')
 into tablename (LOCID) values ('100222')
 into tablename (LOCID) values ('0123')
 into tablename (LOCID) values ('123')
 into tablename (LOCID) values ('0100222z')
Select 1 from dual;
select locid, locid2
from (
    with cte as (
      select locid,  
      case 
       when substr(locid, 1, 1) = '0' then substr(locid, 2, length(locid) - 1) 
       else locid 
      end locid2
    from tablename
    where upper(locid) = lower(locid)
    )
    select locid, locid2, 
       row_number() over (partition by locid2 order by locid desc) as rn
    from cte
)z
where rn = 2

LOCID LOCID2
0100123 100123
0123 123

fiddle

0
Ana On

In case it is useful for someone, I got it finally like this below:

SELECT A.locid , A.CREATEDAT, B.LOCID2, B.CREATEDAT
from (
select locid, createdat
from tableName
where locid like '1%'
--and LENGTH(LOCID) == 10
and regexp_like( LOCID, '^[[:digit:]]*$')
and locstatus != 'DELETED' 
) AS A
INNER JOIN (
select TRIM(LEADING '0' FROM LOCID) AS LOCID2, createdat 
from tableName
where locid like '0%'
and regexp_like( LOCID, '^[[:digit:]]*$')
--and LENGTH(LOCID) == 10
and locstatus != 'DELETED'
) AS B
ON A.locid = B.LOCID2