pair rows if conditions of multiple columns met

140 Views Asked by At
Store Sales Amount Profit
27 75474 9253
30 367852 84463
55 79416 15401

The resulting output should contain pairs of rows which has sales amount +- 3% OR Profit +- 1.5 % of each other

like if store 55's Sales amount fall within the range of +- 3% of sales amount OR +- 1.5 % of profit of any store(lets say 27) the output should be :

| Output|

27-55

df['Lower_range_sales'] = df['Sales_amount'] - df['Sales_amount']*0.03
df['Upper_range_sales'] = df['Sales_amount'] + df['Sales_amount']*0.03
df['Lower_range_Profit'] = df['Profit'] - df['Profit']*0.015
df['Upper_range_Profit'] = df['Profit'] + df['Profit']*0.015
1

There are 1 best solutions below

0
On

Let's call stores that meet your condition "within range" of each other. There are no such store in your sample data set so we need to mock it with some randomized data:

# Generate sample data
import string
from itertools import permutations

store_names = ["".join(p) for p in permutations(list(string.ascii_uppercase), 2)]

n = 100
np.random.seed(42)
df = pd.DataFrame({
    "Store": np.random.choice(store_names, n, replace=False),
    "Sales Amount": np.random.randint(10_000, 1_000_000, n),
    "Profit": np.random.randint(0, 100_000, n)
})

# ---------------------------------------------------------
# Code
# ---------------------------------------------------------

# Use numpy broadcast to calculate the lower and upper
# limit of each metric
sales = df["Sales Amount"].to_numpy()[:, None]
sales_lower, sales_upper = (sales * [0.97, 1.03]).T

profit = df["Profit"].to_numpy()[:, None]
profit_lower, profit_upper = (profit * [0.985, 1.015]).T

# mask is an n*n matrix, comparing every store against every
# other store to see if they are within range. If mask[i,j]
# is True, the two stores are within range of each other.
mask = (
    ((sales_lower <= sales) & (sales <= sales_upper))
    | ((profit_lower <= profit) & (profit <= profit_upper)) 
)

# If mask[i,j] is True, then mask[j,i] is also True. Hence
# we only need the upper triangle of the matrix (np.triu =
# triangle upper). And since mask[i,i] is always True, we
# don't need the diagonal either. Hence k=1.
# nonzero() returns the indices of all True elements.
s1, s2 = np.triu(mask, k=1).nonzero()

# Assemble the result
store = df["Store"].to_numpy()
result = pd.DataFrame({
    "Store1": store[s1],
    "Store2": store[s2]
})

Result:

Store1   Store2
     IV      AL   # store IV is within range of store AL
     IV      CG   # store IV is within range of store CG
     IV      ZU
     IV      DQ
     IV      RP
    ...     ...