Iterate all value combinations in pairwise row comparison in python

Question

Iterate all value combinations in pairwise row comparison in python

315 Views Asked by Adan Horta At 10 May 2017 at 06:19

I have a data frame with genomic bins in the following format. Each genomic range is represented as a row and the cell value corresponds to that start of the bin.

        0       1       2       3      4      5    ...   522  

0    9248    9249     NaN     NaN     NaN    NaN   ...   NaN
1   17291   17292   17293   17294   17295    NaN   ...   NaN
2   18404   18405   18406   18407     NaN    NaN   ...   NaN

[69 rows x 522 columns]

As you can see, many of the row values are incomplete because some genomic ranges are smaller than others.

I wish to make pairwise combination for each index across the entire row. It would be fine if each pairwise interaction was stored as a separate data frame (preferable, even).

I want something like this:

0 - 1 Pairwise:
0      1
9248   17291
9248   17292
9248   17293
9248   17294
9248   17295
9249   17291
9249   17292
9249   17293
9249   17294
9249   17295
[10 rows x 2 columns]

0 - 2 Pairwise:
0       2
9248   18404
9248   18405
9248   18406
9248   18407
9249   18404
9249   18405
9249   18406
9249   18407
[8 rows x 2 columns]

I need every value combination for each pairwise row combination. I think I need to use itertools.product() to do this sort of thing but cannot figure out how to write the appropriate loop. Any help is greatly appreciated!

Original Q&A

There are 1 best solutions below

**Allen Qin** · Accepted Answer · 2017-05-10T07:05:38.210000

Setup

from pandas.tools.util import cartesian_product as cp

df = pd.DataFrame({'0': {0: 9248, 1: 17291, 2: 18404},
 '1': {0: 9249, 1: 17292, 2: 18405},
 '2': {0: np.nan, 1: 17293.0, 2: 18406.0},
 '3': {0: np.nan, 1: 17294.0, 2: 18407.0},
 '4': {0: np.nan, 1: 17295.0, 2: np.nan},
 '5': {0: np.nan, 1: np.nan, 2: np.nan},
 '522': {0: np.nan, 1: np.nan, 2: np.nan}})

Solution

final={}
# use cartesian_product to get all the combinations for each row with other rows and add the results to the final dictionary.
df.apply(lambda x: [final.update({(x.name, i): np.r_[cp([x.dropna(), df.iloc[i].dropna()])].T}) for i in range(x.name+1,len(df))], axis=1)

Verification

for k, v in final.items():
    print(k)
    print(v)

(0, 1)
[[  9248.  17291.]
 [  9248.  17292.]
 [  9248.  17293.]
 ..., 
 [  9249.  17293.]
 [  9249.  17294.]
 [  9249.  17295.]]
(1, 2)
[[ 17291.  18404.]
 [ 17291.  18405.]
 [ 17291.  18406.]
 ..., 
 [ 17295.  18405.]
 [ 17295.  18406.]
 [ 17295.  18407.]]
(0, 2)
[[  9248.  18404.]
 [  9248.  18405.]
 [  9248.  18406.]
 ..., 
 [  9249.  18405.]
 [  9249.  18406.]
 [  9249.  18407.]]

Iterate all value combinations in pairwise row comparison in python

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in GENOMICRANGES

Trending Questions

Popular # Hahtags

Popular Questions