i use the pandas.DataFrame.drop_duplicates to search duplicates in a dataframe. This removes the duplicates from the dataframe. This also works great. However, I would like to know which data has been removed.
Is there a way to save the data in a new list before removing it?
I have unfortunately found in the documentation of pandas no information on this.
Thanks for the answer.
It uses
duplicated
function to filter out the information which is duplicated. By default the first occurrence is set toTrue
, all others set asFalse
, Using this function and filter on original data, you can know which data is kept and which is dropped out.https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html