I am trying to subtract one column from all the other columns in the dataframe (I have 500000 columns btw)

60 Views Asked by srinaath_david At 25 February 2024 at 09:43

I tried this:

for col in cols1:
    reg_df[col]=df[col] = reg_df[col].sub(reg_df['Intercept'])
    print(col)

Because I have 500,000 columns it is taking forever. Something like 220 hours. Is there anyway to speed up the process?

Original Q&A

There are 2 best solutions below

Panda Kim On 25 February 2024 at 09:49

Code

use broadcasting

out = reg_df.sub(reg_df['Intercept'], axis=0)

Sameple

import pandas as pd
import numpy as np
np.random.seed(0)
reg_df = pd.DataFrame(np.random.randint(0, 10, (10, 500000))).rename({0: 'Intercept'}, axis=1)

vectorized operation

import time

start = time.time()

out = reg_df.sub(reg_df['Intercept'], axis=0)

end = time.time()

print(f"{end - start:.5f} sec")

time:

0.01237 sec

your for loop

start = time.time()

cols1 = reg_df.columns
for col in cols1:
    reg_df[col] = reg_df[col].sub(reg_df['Intercept'])

end = time.time()

print(f"{end - start:.5f} sec")

time:

It has been 10 minutes and it is still not finished.

Use vectorized operations.

jri On 25 February 2024 at 09:51

You can use a vectorized approach for this.

import pandas as pd

reg_df.iloc[:, 1:] = reg_df.iloc[:, 1:].sub(reg_df['Intercept'], axis=0)

I am trying to subtract one column from all the other columns in the dataframe (I have 500000 columns btw)

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in SUBTRACTION

Trending Questions

Popular # Hahtags

Popular Questions