Cannot Keep My Datetime Data and 'No' Word in My Pandas DataFrame

46 Views Asked by rainy days. At 20 August 2025 at 18:45

I have a pandas dataframe from csv and I want to clean it using Regex in Python. The data that I have look like this:

Name	Date	Status	Number
A/bCDef	2022-07-11	Yes	io123-07
GhIjK-l	2022-07-12	No	io456-08

I'm trying to clean the dataframe so it will be easier to process, but the thing is, my code deletes the date, the word 'no', and the hyphen.

This the data that I got so far:

name	date	status	number
abcdef		yes	io
ghijkl		no	io

This is the code that I found on the internet and tried on my dataframe:

def regex_values(cols):
    nltk.download("stopwords")
    stemmer = nltk.SnowballStemmer('english')
    stopword = set(stopwords.words('english'))

    cols = str(cols).lower()
    cols = re.sub('\[.*?\]', '', cols)
    cols = re.sub('https?://\S+|www\.\S+', '', cols)
    cols = re.sub('<.*?>+/', '', cols)
    cols = re.sub('[%s]' % re.escape(string.punctuation), '', cols)
    cols = re.sub('\n', '', cols)
    cols = re.sub('\w*\d\w*', '', cols)
    cols = re.sub(r'^\s+|\s+$', '', cols)
    cols = re.sub(' +', ' ', cols)
    cols = re.sub(r'\b(\w+)(?:\W\1\b)+', 'r\1', cols, flags = re.IGNORECASE)
    cols = [word for word in cols.split(' ') if word not in stopword]
    cols = " ".join(cols)
    
    return cols

This is the pandas dataframe that I wish to have at the end:

name	date	status	number
abcdef	2022-07-11	yes	io123-07
ghijkl	2022-07-12	no	io456-08

I'm new to Regex so I wish anyone can help me to code the right code. Or if there is a simpler way to clean my data I would much appreciate the help. Thanks in advance.

Original Q&A

There are 1 best solutions below

Bushmaster On 01 December 2022 at 21:04

can you try this:

df = df.applymap(lambda s: s.lower() if type(s) == str else s) #lower string values
df.columns = df.columns.str.lower() #lower for columns
df['name']=df['name'].str.replace(r'\W+', '') #remove any non-word character

#output
'''
     name        date status    number
0  abcdef  2022-07-11    yes  io123-07
1  ghijkl  2022-07-12     no  io456-08
'''

Cannot Keep My Datetime Data and 'No' Word in My Pandas DataFrame

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in DATETIME

Related Questions in NLP

Related Questions in PYTHON-REGEX

Trending Questions

Popular # Hahtags

Popular Questions