I got error message 'Boolean array expected for the condition, not int64'. Can anybody help me solve this problem?

749 Views Asked by At

I have a dataset in .csv format. It has 136 columns and 15036 rows. I want to compute entropy and information gain of each column of dataset. Here my code:

def calc_entropy(column):
counts = np.bincount(column) 
prob = counts/(len(column))
entropy = 0
for prob in prob: 
    if prob > 0:
        entropy += prob * math.log(prob, 2) 
return -entropy

def information_gain(data, split, target):
ori_entropy = calc_entropy(data[target])
values = data[split].unique()
left_split = data[data[split]==values[0]]
right_split = data[data[split]==values[1]]    
subract = 0
for subset in [left_split,right_split]:
    prob = (subset.shape[0])/data.shape[0]
    subract += prob * calc_entropy(subset[target])
return  ori_entropy - subract

print(calc_entropy(dre[dre.iloc[:,0:136]]))
print(information_gain(dre, dre.iloc[:,0:136],"type"))

But, I got error message:

File "D:\Informatics\FinalProject\Features_Ranking.py", line 59, in <module> print(calc_entropy(dre[dre.iloc[:,0:136]]))
File "C:\Users\ana\anaconda3\lib\site-packages\pandas\core\frame.py", line 2889, in __getitem__ return self.where(key)
File "C:\Users\ana\anaconda3\lib\site-packages\pandas\core\generic.py", line 9004, in where return self.where(
File "C:\Users\ana\anaconda3\lib\site-packages\pandas\core\generic.py", line 8766, in _where raise ValueError(msg.format(dtype=dt))
ValueError: Boolean array expected for the condition, not int64

Thank you

0

There are 0 best solutions below