I have run into a data problem that I sense many have encountered. I Currently have a data set which contains transaction information. Based on the Transaction_Number I will find how long each person involved in transaction spent on their part.
The final output looks like this
Transaction_Number Created On Created_By Time_Diff
27327920232 2014-12-08 03:29:08 jake NaN
27323232336 2017-04-28 18:39:03 James 0
27323232336 2017-04-28 18:26:23 Tony 760.0
Now the problem that I am facing is that I am missing one column. Which I solved by querying from the DB and pulled a field named "Product_Number". I am joining the two tables on Transaction_Number. Here is where the problem arises the second data frame name D has the Transaction_Number listed only once matching the Product_Number. The master data frame has multiple occurrences of the Transactions. Therefore when I run the following code my result doesn't make sense. Outputting only 4 matches based on 12K+ possibilities.
X D D
Transaction_Number Transaction_Number Product_Number
27327920232 27327920232 Cw-xs-re-89
27327920232 72732332323 pw-ts-sf-89
27327920232 32338033383 Cw-ns-tn-86
results = pd.merge(X, D, how='inner', on= "Transaction_Number")
What can I do to find the correct matchings. I have also attempted a left join, any type of tips will help. - Thanks
Are you looking for outer merge?