I have a data frame df:
first_seen last_seen uri
0 2015-05-11 23:08:46 2015-05-11 23:08:50 http://11i-ssaintandder.com/
1 2015-05-11 23:08:46 2015-05-11 23:08:46 http://11i-ssaintandder.com/
2 2015-05-02 18:27:10 2015-06-06 03:52:03 http://goo.gl/NMqjd1
3 2015-05-02 18:27:10 2015-06-08 08:44:53 http://goo.gl/NMqjd1
I would like to remove the rows that has the same "first_seen","uri" and keep only the row that has the latest last_seen.
Here is the an example of expected
dataset:
first_seen last_seen uri
0 2015-05-11 23:08:46 2015-05-11 23:08:50 http://11i-ssaintandder.com/
3 2015-05-02 18:27:10 2015-06-08 08:44:53 http://goo.gl/NMqjd1
Does anybody know who to do it without writing a for loop?
Call
drop_duplicates
and pass the columns you want to consider for duplicate matching as the args forsubset
and set paramtake_last=True
:EDIT
In order to take the latest date you need to sort the df first on 'first_seen' and 'last_seen':