How to find duplicate images

396 Views Asked by At

Using the below table I need to find duplicate image records. The query is used to find records but not accurate results coming. After fetching duplicate image records then we want to fetch only those records who have FM_NAME_EN, LASTNAME_EN, RLN_TYPE, RLN_FM_NM_EN, RLN_L_NM_EN,GENDER,AGE and PHOTO are same. Table used have column like as below

 CREATE TABLE [dbo].[ERoll](
    [AC_NO] [int] NULL,
    [PART_NO] [int] NULL,
    [SLNOINPART] [int] NULL,
    [FM_NAME_EN] [nvarchar](250) NULL,
    [LASTNAME_EN] [nvarchar](250) NULL,
    [RLN_TYPE] [nvarchar](1) NULL,
    [RLN_FM_NM_EN] [nvarchar](250) NULL,
    [RLN_L_NM_EN] [nvarchar](250) NULL,
    [EPIC_NO] [nvarchar](25) NULL,
    [GENDER] [nvarchar](1) NULL,
    [AGE] [int] NULL,
    [PHOTO] [image] NULL
);

In the above table some records with duplicate images. So how we can find duplicate image records?

Only 50% records are coming correctly through below query:

SELECT e.AC_NO, e.PART_NO, e.SLNOINPART, e.FM_NAME_EN, e.LASTNAME_EN, 
    e.RLN_FM_NM_EN, e.RLN_L_NM_EN, e.RLN_TYPE, e.GENDER, e.AGE
FROM S21_EROLL_MANAGEMENT_PROD.dbo.ERoll e
JOIN (
    select FM_NAME_EN, LASTNAME_EN, RLN_FM_NM_EN, RLN_L_NM_EN, RLN_TYPE, GENDER, AGE, hashbytes('md5', cast([PHOTO] as varbinary)) PHOTO ,count(*) PHOTO_COUNT
    from S21_EROLL_MANAGEMENT_PROD.dbo.ERoll
    group by FM_NAME_EN, LASTNAME_EN, RLN_FM_NM_EN, RLN_L_NM_EN, RLN_TYPE, GENDER, AGE, hashbytes('md5', cast([PHOTO] as varbinary)) 
    having count(*) > 1
) d ON e.FM_NAME_EN = d.FM_NAME_EN and e.LASTNAME_EN = d.LASTNAME_EN and e.RLN_FM_NM_EN=d.RLN_FM_NM_EN 
    and e.RLN_L_NM_EN = d.RLN_L_NM_EN and e.RLN_TYPE = d.RLN_TYPE and e.GENDER = d.GENDER and e.AGE = d.AGE
order by AC_NO, PART_NO, SLNOINPART
3

There are 3 best solutions below

0
On

Use window functions:

select e.*
from (select e.*
             count(*) over (partition by FM_NAME_EN, LASTNAME_EN, RLN_FM_NM_EN, RLN_L_NM_EN, RLN_TYPE, GENDER, AGE, hashbytes('md5', cast([PHOTO] as varbinary))) as cnt
      from S21_EROLL_MANAGEMENT_PROD.dbo.ERoll e
     ) e
where cnt > 1;
3
On

Uing GROUP BY, HAVING, COUNT() :

select hashbytes('md5', cast([imagecolumn] as varbinary(max))),count(*)
from yourtable
group by hashbytes('md5', cast([imagecolumn] as varbinary(max)))
having count(*) > 1
0
On

I guess you can try find duplicate photos like this

SELECT HASHBYTES('SHA2_256', cast(photo as varbinary(max))) from dbo.ERoll