I have a dataframe with both string and integer values.
Attaching a sample data dictionary to understand the dataframe that I have:
data = {
'col1': ['A','A','A','B','B','B','C','C','C','D','D','D'],
'col2': [10,20,30,10,20,30,10,20,30,10,20,30],
'col3': ['X','X','X','X','Y','X','X','X','Y','Y','X','X'],
'col4': [45,23,78,56,12,34,87,54,43,89,43,12]
'col5': [3,4,6,4,3,2,4,3,5,3,4,6]
}
I need to extract data as under:
- Max value from col4
- Grouped by col1
- Filtered out col3 from the result if value is Y
- Filter col5 from the result to show only values not more than 5.
So I tried something and faced following problems.
1- I used following method to find max value from the data. But I am not able to find max value from each group.
print(dataframe['col4'].max()) #this worked to get one max value
print(dataframe.groupby('col1').max() #this doesn't work
Second one doesn't work for me as that returns maximum value for col2 as well. I need the result to have col2 value against the max row under each group.
2- I am not able to apply filter on both col3 (str) and col5 (int) in one command. Any way to do that?
print(dataframe[dataframe['col3'] != 'Y' & dataframe['col5'] < 6]) #generates an error
The output that I am expecting through this is:
col1 col2 col3 col4 col5
0 A 10 X 45 3
3 B 10 X 56 4
6 C 10 X 87 4
10 D 20 X 43 4
#
# 78 is max in group A, but ignored as col5 is 6 (we need < 6)
# Similarly, 89 is max in group D, but ignored as col3 is Y.
I apologize if I am doing something wrong. I am quite new to this.
Thank you.
I'm not a python developer, but im my opinion you do it in a wrong way. You shoud have a list of structure insted of structure of list. Then you can start workin on such list.
This is an example solution, so probably it coud be done im much smootcher way:
result: