Unable to clean the csv file in python

210 Views Asked by At

I am trying to load a CSV file into python and clean the text. but I keep getting an error. I saved the CSV file in a variable called data_file and the function below cleans the text and supposed to return the clean data_file.

import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt

df = pd.read_csv("/Users/yoshithKotla/Desktop/janTweet.csv")
data_file = df

print(data_file)


def cleanTxt(text):
    text = re.sub(r'@[A-Za-z0-9]+ ', '', text)  # removes @ mentions
    text = re.sub(r'#[A-Za-z0-9]+', '', text)
    text = re.sub(r'RT[\s]+', '', text)
    text = re.sub(r'https?:\/\/\S+', '', text)

    return text


df['data_file'] = df['data_file'].apply(cleanTxt)

df 

I get a key error here.

1

There are 1 best solutions below

1
On

the key error comes from the fact that you are trying to apply a function to the column data_file of the dataframe df which does not contain such a column. You juste created a copy of df in your line data_file = df.

To change the column names of your dataframe df use: df.columns = [list,of,values,corresponding,to,your,columns]

Then you can either apply the function to the right column or on the whole dataframe.

To apply a function on the whole dataframe you may want to use the .applymap() method.

EDIT

For clarity's sake:

To print your column names and the length of your dataframe columns:

print(df.columns)
print(len(df.columns))

To modify your column names:

df.columns = [list,of,values,corresponding,to,your,columns]

To apply your function on a column:

df['your_column_name'] = df['your_column_name'].apply(cleanTxt)

To apply your function to your whole dataframe:

df = df.applymap(cleanTxt)