Data cleaning: How to index a Series value and deal with duplicate values indices in Python?

Question

Data cleaning: How to index a Series value and deal with duplicate values indices in Python?

221 Views Asked by AmyN At 17 August 2025 at 05:17

I'm a Python 3.x new user and try to get a little help on data cleaning code.

Here is the simulation data:

I got a database of customers'orders and have to clean the columns UserPhone. In this columns, a value can be a str (ie: 0909111111, 0909.111.111) or number (ie: 909111111, 909111111.0, 84909111111).

I want the result to be: 909111111.

To do that, I must:

remove'.0' from all values
remove all not-digit characters (space, text if any) from all the values
remove '0' if a value starts with '0'
remove '84' if a value starts with '84'

I guess the 4 steps have the same structure. For example, to solve task 2, I try to write some code like this:

import pandas as pd
test=pd.read_excel('D:/relay-foods.xlsx')
b=test['UserPhone']
for i in test['UserPhone']:
    if i:
        i = str(i)
        for j in i:
            if not j.isdigit():
                index_i=test.index(i)
                i = i.replace(j, '')
                b[index_i]=i

But it results an error TypeError: 'RangeIndex' object is not callable I searched and realized that the index(i) works only for List while 'UserPhone' is Series type. I tried test['UserPhone']=test['UserPhone'].tolist() but it does not work. Could anyone please help me with this please?

And one more thing that's worth notice is that 'UserPhone' contains duplicate that can result the same index for the same 'UserPhone' value at different rows.

Thank you very much for your help!

Original Q&A

There are 2 best solutions below

AmyN On 22 December 2016 at 10:25

I made an edit based on @Abdou 's code and this code works:

test['UserPhone'] = test['UserPhone'].astype(str).str.replace('\.0|^0|^84|[^0-9]+‌','')

The only different is the \.0 instead of .0 because the the dot (.) is a metacharacter, we have to put a '\' before it to remove its special meaning. Thank you @Abdou again!

**Abdou** · Accepted Answer

Try converting the UserPhone column to a string column and then using the .str.replace method to replace anything meeting your outlined conditions. The following should do:

test['UserPhone'] = test['UserPhone'].astype(str).str.replace('\.0$|^0|^84|[^0-9]+‌','').astype(int)

For the pattern provided here:

\.0$ matches '.0' that is at the end of the phone number;

^0 matches any zero that is at the beginning of the phone number;

^84 matches any "84" that is at the beginning of the phone number;

[^0-9]+‌ matches one or more character that is not a number.

The | operator is the equivalent of OR.

Also, feel free to remove the .astype(int) part from the code if you want the output to remain a string.

I hope this helps.

Data cleaning: How to index a Series value and deal with duplicate values indices in Python?

There are 2 best solutions below

Related Questions in PYTHON-3.X

Related Questions in INDEXING

Related Questions in DUPLICATES

Related Questions in SERIES

Trending Questions

Popular # Hahtags

Popular Questions