Conditional dataframe manipulation by row

63 Views Asked by At

Say I have a df like

sample dataframe

and I want a df like this

enter image description here

How would I do this in python or R? This would be so easy in excel with a simple if statement, for example: c5 =IF(c2 = "X", "ccc", c4).

I thought this would be simple in R too, but I tried df <- df %>% mutate(c4 = ifelse(c2 = 'X', paste(c3, c3, c3), c4)), and it fills all the other values with NA's:

enter image description here

Why is this happening and how would I fix it?

Ideally though, I'd like to do this in python. I've tried dfply's mutate and ifelse similarly to the above, and using pandas loc function, but neither have worked.

This feels like it should be really simple - is there something obvious that I'm missing?

3

There are 3 best solutions below

2
akrun On

We may need strrep in R

library(dplyr)
df %>%
   mutate(c4 = ifelse(c2 %in% "X", strrep(c3, nchar(c4)), c4))

-output

  id c2 c3  c4
1  1     a aaa
2  2     b bbb
3  3  X  c ccc

data

df <- structure(list(id = 1:3, c2 = c("", "", "X"), c3 = c("a", "b", 
"c"), c4 = c("aaa", "bbb", "zzz")), class = "data.frame", row.names = c(NA, 
-3L))
0
Mustafa Aydın On
df.c4.where(df.c2.ne("X"), other=df.c3 * 3)

This reads as

"for c4 column: where the c2 values are not equal to "X", keep them as is; otherwise, put the 3-times repeated c3 values".

Example run:

In [182]: df
Out[182]:
   id c2 c3   c4
0   1     a  aaa
1   2     b  bbb
2   3  X  c  zzz

In [183]: df.c4 = df.c4.where(df.c2.ne("X"), other=df.c3 * 3)

In [184]: df
Out[184]:
   id c2 c3   c4
0   1     a  aaa
1   2     b  bbb
2   3  X  c  ccc
0
SomeDude On

I think you can just do in pandas:

m = df['c2'] == 'X'
df.loc[m, 'c4'] = df.loc[m, 'c3'].str.repeat(3)

Look for rows whose 'c2' is 'X' and locate 'c3' column , repeat it 3 times and modify the 'c4' column inplace with .loc