Oracle SQL code cannot fetch unique columns- A lot of duplicate data in the Output

46 Views Asked by At

There are multiple updates in the salary amount, hence duplicate data with same values but different salary. My task is to remove the duplicity and only show the latest salary update data.

The main problem with code is that it is showing duplicate data.

My Code is:

SELECT PER_PERSON_NAMES_F.TITLE,
               PER_PERSON_NAMES_F.FIRST_NAME,
               PER_PERSON_NAMES_F.LAST_NAME,
               PER_PERSON_NAMES_F.FULL_NAME,
              PER_ALL_PEOPLE_F.PERSON_NUMBER,
              PER_NATIONAL_IDENTIFIERS.NATIONAL_IDENTIFIER_NUMBER,
             PER_NATIONAL_IDENTIFIERS.LEGISLATION_CODE,
             to_char(PER_ALL_PEOPLE_F.EFFECTIVE_START_DATE,'DD-MM-YYYY') AS EFFECTIVE_START_DATE,
             to_char(PER_ALL_PEOPLE_F.EFFECTIVE_END_DATE,'DD-MM-YYYY') AS EFFECTIVE_END_DATE,
             to_char(PER_PERSONS.DATE_OF_BIRTH,'DD-MM-YYYY') AS DOB,
             PER_ALL_ASSIGNMENTS_M.ASSIGNMENT_TYPE,
             PER_EMAIL_ADDRESSES.EMAIL_ADDRESS,
            PER_PHONES.PHONE_NUMBER
FROM PER_ALL_PEOPLE_F JOIN PER_PERSONS ON PER_ALL_PEOPLE_F.PERSON_ID=PER_PERSONS.PERSON_ID JOIN PER_PERSON_NAMES_F ON 
PER_PERSONS.PERSON_ID=PER_PERSON_NAMES_F.PERSON_ID JOIN PER_ALL_ASSIGNMENTS_M ON
PER_PERSON_NAMES_F.PERSON_ID=PER_ALL_ASSIGNMENTS_M.PERSON_ID JOIN PER_NATIONAL_IDENTIFIERS ON 
PER_ALL_ASSIGNMENTS_M.PERSON_ID=PER_NATIONAL_IDENTIFIERS.PERSON_ID JOIN PER_EMAIL_ADDRESSES ON 
PER_NATIONAL_IDENTIFIERS.PERSON_ID=PER_EMAIL_ADDRESSES.PERSON_ID JOIN 
(
   SELECT PERSON_ID,TRUNC(SALARY_AMOUNT,2) AS SALARY,max(LAST_UPDATE_DATE) as SALARY_UPDATE_DATE
    FROM CMP_SALARY
    GROUP BY PERSON_ID
)  CMP_SALARY ON
PER_EMAIL_ADDRESSES.PERSON_ID=CMP_SALARY.PERSON_ID JOIN PER_PHONES ON CMP_SALARY.PERSON_ID = PER_PHONES.PERSON_ID
AND PER_NATIONAL_IDENTIFIERS.NATIONAL_IDENTIFIER_NUMBER IN (
SELECT DISTINCT NATIONAL_IDENTIFIER_NUMBER FROM PER_NATIONAL_IDENTIFIERS)
ORDER BY PER_ALL_PEOPLE_F.PERSON_NUMBER

How to remove duplicity in this code?

3

There are 3 best solutions below

5
subhojeetb On
SELECT PERSON_ID,TRUNC(SALARY_AMOUNT,2) AS SALARY,LAST_UPDATE_DATE as SALARY_UPDATE_DATE
FROM CMP_SALARY
QUALIFY ROW_NUMBER() OVER (PARTITION BY PERSON_ID ORDER BY LAST_UPDATE_DATE DESC) = 1


OR 

SELECT PERSON_ID,SALARY,SALARY_UPDATE_DATE FROM(
SELECT PERSON_ID,TRUNC(SALARY_AMOUNT,2) AS SALARY,LAST_UPDATE_DATE AS SALARY_UPDATE_DATE,ROW_NUMBER() OVER(PARTITION BY PERSON_ID ORDER BY LAST_UPDATE_DATE DESC) AS row_num
FROM CMP_SALARY)
WHERE row_num = 1;

You may try the above to dedup the data , second solution if qualify is not supported.

4
Littlefoot On

If you pre-calculate salaries (using row_number function), then see marked part of code:

  SELECT per_person_names_f.title,
         per_person_names_f.first_name,
         per_person_names_f.last_name,
         per_person_names_f.full_name,
         per_all_people_f.person_number,
         per_national_identifiers.national_identifier_number,
         per_national_identifiers.legislation_code,
         TO_CHAR (per_all_people_f.effective_start_date, 'DD-MM-YYYY')
            AS effective_start_date,
         TO_CHAR (per_all_people_f.effective_end_date, 'DD-MM-YYYY')
            AS effective_end_date,
         TO_CHAR (per_persons.date_of_birth, 'DD-MM-YYYY')
            AS dob,
         per_all_assignments_m.assignment_type,
         per_email_addresses.email_address,
         per_phones.phone_number
    FROM per_all_people_f
         JOIN per_persons ON per_all_people_f.person_id = per_persons.person_id
         JOIN per_person_names_f ON per_persons.person_id = per_person_names_f.person_id
         JOIN per_all_assignments_m
            ON per_person_names_f.person_id = per_all_assignments_m.person_id
         JOIN per_national_identifiers
            ON per_all_assignments_m.person_id = per_national_identifiers.person_id
         JOIN per_email_addresses
            ON per_national_identifiers.person_id = per_email_addresses.person_id
         JOIN
-- change begins here ...
         (SELECT x.person_id,
                 TRUNC (x.salary_amount, 2) AS salary,
                 x.last_update_date AS salary_update_date
            FROM (SELECT person_id,
                         salary_amount,
                         last_update_date,
                         ROW_NUMBER ()
                            OVER (PARTITION BY person_id ORDER BY last_update_date DESC) rn_sal
                    FROM cmp_salary
                 ) x
           WHERE x.rn_sal = 1
         ) cmp_salary
-- ... and ends here
            ON per_email_addresses.person_id = cmp_salary.person_id
         JOIN per_phones
            ON     cmp_salary.person_id = per_phones.person_id
               AND per_national_identifiers.national_identifier_number IN
                      (SELECT DISTINCT national_identifier_number
                         FROM per_national_identifiers)
ORDER BY per_all_people_f.person_number
2
MT0 On

You can use MAX(salary) KEEP (DENSE_RANK LAST ORDER BY last_update_date) to get the maximum salary for the latest update date:

SELECT n.TITLE,
       n.FIRST_NAME,
       n.LAST_NAME,
       n.FULL_NAME,
       ap.PERSON_NUMBER,
       ni.NATIONAL_IDENTIFIER_NUMBER,
       ni.LEGISLATION_CODE,
       to_char(ap.EFFECTIVE_START_DATE,'DD-MM-YYYY') AS EFFECTIVE_START_DATE,
       to_char(ap.EFFECTIVE_END_DATE,'DD-MM-YYYY') AS EFFECTIVE_END_DATE,
       to_char(p.DATE_OF_BIRTH,'DD-MM-YYYY') AS DOB,
       aa.ASSIGNMENT_TYPE,
       e.EMAIL_ADDRESS,
       ph.PHONE_NUMBER
FROM   PER_ALL_PEOPLE_F ap
       JOIN PER_PERSONS p               ON ap.PERSON_ID = p.PERSON_ID
       JOIN PER_PERSON_NAMES_F n        ON ap.PERSON_ID = n.PERSON_ID
       JOIN PER_ALL_ASSIGNMENTS_M aa    ON ap.PERSON_ID = a.PERSON_ID
       JOIN PER_NATIONAL_IDENTIFIERS ni ON ap.PERSON_ID = ni.PERSON_ID
       JOIN PER_EMAIL_ADDRESSES e       ON ap.PERSON_ID = e.PERSON_ID
       JOIN (
         SELECT PERSON_ID,
                TRUNC(
                  MAX(SALARY_AMOUNT) KEEP (DENSE_RANK LAST ORDER BY last_update_date),
                  2
                ) AS SALARY,
                max(LAST_UPDATE_DATE) as SALARY_UPDATE_DATE
         FROM   CMP_SALARY
         GROUP BY PERSON_ID
      ) s                               ON ap.PERSON_ID = s.PERSON_ID
      JOIN PER_PHONES ph                ON ap.PERSON_ID = ph.PERSON_ID
ORDER BY
      ap.PERSON_NUMBER

or, use the ROW_NUMBER analytic function (without using GROUP BY):

SELECT n.TITLE,
       n.FIRST_NAME,
       n.LAST_NAME,
       n.FULL_NAME,
       ap.PERSON_NUMBER,
       ni.NATIONAL_IDENTIFIER_NUMBER,
       ni.LEGISLATION_CODE,
       to_char(ap.EFFECTIVE_START_DATE,'DD-MM-YYYY') AS EFFECTIVE_START_DATE,
       to_char(ap.EFFECTIVE_END_DATE,'DD-MM-YYYY') AS EFFECTIVE_END_DATE,
       to_char(p.DATE_OF_BIRTH,'DD-MM-YYYY') AS DOB,
       aa.ASSIGNMENT_TYPE,
       e.EMAIL_ADDRESS,
       ph.PHONE_NUMBER
FROM   PER_ALL_PEOPLE_F ap
       JOIN PER_PERSONS p               ON ap.PERSON_ID = p.PERSON_ID
       JOIN PER_PERSON_NAMES_F n        ON ap.PERSON_ID = n.PERSON_ID
       JOIN PER_ALL_ASSIGNMENTS_M aa    ON ap.PERSON_ID = a.PERSON_ID
       JOIN PER_NATIONAL_IDENTIFIERS ni ON ap.PERSON_ID = ni.PERSON_ID
       JOIN PER_EMAIL_ADDRESSES e       ON ap.PERSON_ID = e.PERSON_ID
       JOIN (
         SELECT PERSON_ID,
                SALARY_AMOUNT AS SALARY,
                LAST_UPDATE_DATE as SALARY_UPDATE_DATE,
                ROW_NUMBER() OVER (
                  PARTITION BY person_id
                  ORDER BY last_update_date DESC, salary_amount DESC
                ) AS rn
         FROM   CMP_SALARY
      ) s                               ON  ap.PERSON_ID = s.PERSON_ID
                                        AND s.rn = 1
      JOIN PER_PHONES ph                ON ap.PERSON_ID = ph.PERSON_ID
ORDER BY
      ap.PERSON_NUMBER