Numpy / Pandas slicing based on intervals

229 Views Asked by At

Trying to figure out a way to slice non-contiguous and non-equal length rows of a pandas / numpy matrix so I can set the values to a common value. Has anyone come across an elegant solution for this?

import numpy as np
import pandas as pd
x = pd.DataFrame(np.arange(12).reshape(3,4))
#x is the matrix we want to index into

"""
x before:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
"""
y = pd.DataFrame([[0,3],[2,2],[1,2],[0,0]]) 
#y is a matrix where each row contains a start idx and end idx per column of x

"""
   0  1
0  0  3
1  2  3
2  1  3
3  0  1
"""

What I'm looking for is a way to effectively select different length slices of x based on the rows of y

x[y] = 0 
"""
x afterwards:
array([[ 0,  1,  2,  0],
       [ 0,  5,  0,  7],
       [ 0,  0,  0, 11]])
2

There are 2 best solutions below

0
On

Masking can still be useful, because even if a loop cannot be entirely avoided, the main dataframe x would not need to be involved in the loop, so this should speed things up:

mask = np.zeros_like(x, dtype=bool)

for i in range(len(y)):
    mask[y.iloc[i, 0]:(y.iloc[i, 1] + 1), i] = True
    
x[mask] = 0
x
    0   1   2   3
0   0   1   2   0
1   0   5   0   7
2   0   0   0   11

As a further improvement, consider defining y as a NumPy array if possible.

0
On

I customized this answer to your problem:

y_t = y.values.transpose()
y_t[1,:] = y_t[1,:] - 1 # or remove this line and change '>= r' below to '> r`

r = np.arange(x.shape[0])

mask = ((y_t[0,:,None] <= r) & (y_t[1,:,None] >= r)).transpose()

res = x.where(~mask, 0)
res
#     0   1   2   3
# 0   0   1   2   0
# 1   0   5   0   7
# 2   0   0   0   11