Hi it's something fundamental but I can't fix it... unique()
shows unique values in each column, but describe()
shows NaN. Why... Any help's appreciated. thanks
import numpy as np
import pandas as pd
train = pd.read_csv('train.csv', header=0)
# works:
train['Pclass'].unique()
# array([3, 1, 2], dtype=int64)
train['Survived'].unique()
# array([0, 1], dtype=int64)
# not work:
train.describe(include='all')
# PassengerId Survived Pclass Name Sex \
# count 891.000000 891.000000 891.000000 891 891
# unique NaN NaN NaN 891 2
# top NaN NaN NaN Mitkoff, Mr. Mito male
# freq NaN NaN NaN 1 577
# mean 446.000000 0.383838 2.308642 NaN NaN
# std 257.353842 0.486592 0.836071 NaN NaN
# min 1.000000 0.000000 1.000000 NaN NaN
# 25% 223.500000 0.000000 2.000000 NaN NaN
# 50% 446.000000 0.000000 3.000000 NaN NaN
# 75% 668.500000 1.000000 3.000000 NaN NaN
# max 891.000000 1.000000 3.000000 NaN NaN
#
# Age SibSp Parch Ticket Fare Cabin \
# count 714.000000 891.000000 891.000000 891 891.000000 204
# unique NaN NaN NaN 681 NaN 147
# top NaN NaN NaN 347082 NaN C23 C25 C27
# freq NaN NaN NaN 7 NaN 4
# mean 29.699118 0.523008 0.381594 NaN 32.204208 NaN
# std 14.526497 1.102743 0.806057 NaN 49.693429 NaN
# min 0.420000 0.000000 0.000000 NaN 0.000000 NaN
# 25% 20.125000 0.000000 0.000000 NaN 7.910400 NaN
# 50% 28.000000 0.000000 0.000000 NaN 14.454200 NaN
# 75% 38.000000 1.000000 0.000000 NaN 31.000000 NaN
# max 80.000000 8.000000 6.000000 NaN 512.329200 NaN
#
# Embarked
# count 889
# unique 3
# top S
# freq 644
# mean NaN
# std NaN
# min NaN
# 25% NaN
# 50% NaN
# 75% NaN
# max NaN
The
describe
method for numeric columns doesn't list the number of unique values, since this is usually not particularly meaningful for numeric data, thedescribe
method for string columns does:Since your dataframe contains both, the results are being merged and
nan
s inserted where the column doesn't have that value.If your numeric columns are actually just codes reflecting different classes/categories, you might want to convert them to
Categorical
to get more meaningful info about them: