Need to extract or remove columns from python

90 Views Asked by At

I have a list that looks like this:

    categorical_features = \
    ['FireplaceQu', 'BsmtQual', 'BsmtCond', 'GarageQual', 'GarageCond', 
     'ExterQual', 'ExterCond','HeatingQC', 'PoolQC', 'KitchenQual', 'BsmtFinType1', 
     'BsmtFinType2', 'Functional', 'Fence', 'BsmtExposure', 'GarageFinish', 'LandSlope',
     'LotShape', 'PavedDrive', 'Street', 'Alley', 'CentralAir', 'MSSubClass', 'OverallQual',
     'OverallCond', 'YrSold', 'MoSold']

I need to remove these columns from the dataset by doing this:

all_data = all_data.loc[:,categorical_features]

Unfortunately, this step only selects these columns. How would I reverse the process by excluding them instead?

2

There are 2 best solutions below

0
On BEST ANSWER

You can use pandas.drop to exclude those columns:

all_data = all_data.drop(categorical_features, axis = 1)

Look to the following example as a test:

import pandas as pd
import numpy as np

dates = pd.date_range('20130101', periods=6)

df = pd.DataFrame(np.random.randn(6, 4), index = dates, columns = list('ABCD'))

print(df)

features = ['B', 'D']
df = df.drop(features, axis = 1)

print(df)

The output:

                   A         B         C         D
2013-01-01  1.365473 -0.445448  0.244377  0.416889
2013-01-02 -0.307532  0.095569  1.356229 -0.306618
2013-01-03  0.971216  1.100189  0.932189  0.808151
2013-01-04 -0.030160 -0.796742 -0.383336 -0.409233
2013-01-05  0.006601  0.093678 -1.013768  1.439921
2013-01-06  0.560771 -0.452491  1.050500 -1.545958
                   A         C
2013-01-01  1.365473  0.244377
2013-01-02 -0.307532  1.356229
2013-01-03  0.971216  0.932189
2013-01-04 -0.030160 -0.383336
2013-01-05  0.006601 -1.013768
2013-01-06  0.560771  1.050500
0
On

I'd suggest you compute the one you want, that'd easier

categorical_features = \
    ['FireplaceQu', 'BsmtQual', 'BsmtCond', 'GarageQual', 'GarageCond', 
     'ExterQual', 'ExterCond','HeatingQC', 'PoolQC', 'KitchenQual', 'BsmtFinType1', 
     'BsmtFinType2', 'Functional', 'Fence', 'BsmtExposure', 'GarageFinish', 'LandSlope',
     'LotShape', 'PavedDrive', 'Street', 'Alley', 'CentralAir', 'MSSubClass', 'OverallQual',
     'OverallCond', 'YrSold', 'MoSold']

cols = set(df.columns).difference(categorical_features)

all_data = all_data.loc[:,cols]