I am trying to load a CSV file into python and clean the text. but I keep getting an error. I saved the CSV file in a variable called data_file and the function below cleans the text and supposed to return the clean data_file.
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
df = pd.read_csv("/Users/yoshithKotla/Desktop/janTweet.csv")
data_file = df
print(data_file)
def cleanTxt(text):
text = re.sub(r'@[A-Za-z0-9]+ ', '', text) # removes @ mentions
text = re.sub(r'#[A-Za-z0-9]+', '', text)
text = re.sub(r'RT[\s]+', '', text)
text = re.sub(r'https?:\/\/\S+', '', text)
return text
df['data_file'] = df['data_file'].apply(cleanTxt)
df
I get a key error here.
the key error comes from the fact that you are trying to apply a function to the column
data_file
of the dataframedf
which does not contain such a column. You juste created a copy ofdf
in your linedata_file = df
.To change the column names of your dataframe df use:
df.columns = [list,of,values,corresponding,to,your,columns]
Then you can either apply the function to the right column or on the whole dataframe.
To apply a function on the whole dataframe you may want to use the
.applymap()
method.EDIT
For clarity's sake:
To print your column names and the length of your dataframe columns:
To modify your column names:
To apply your function on a column:
To apply your function to your whole dataframe: