Identify stocks with same feature values using a np.array_equal() function in a nested loop

71 Views Asked by At

I would like to understand if my code is working correctly.

The dataframe, df2 is a vertically stacked time series of a stock's feature.

stock_id log_target_vol_corr_32_clusters_stnd
1 0.4
1 0.8
1 0.7
2 0.3
2 0.4
2 0.0
3 0.4
3 0.8
3 0.7
4 0.9
4 0.9
4 0.1
5 0.9
5 0.9
5 0.1

Notice that stocks (1 & 3) and (4 & 5) have the same feature values therefore I want to group them together into a cluster. Ultimately, I want to find all the stock ids belonging to each cluster.

## find stock ids of clusters having same feature values
column = 'log_target_vol_corr_32_clusters_stnd'
remaining_stocks = df2['stock_id'].unique().astype(int)
clusters = {}
for s in remaining_stocks:
    print(s)
    clusters[s] = []
    a1 = df2[df2['stock_id'] == s ][column]
    remaining_stocks = np.delete(remaining_stocks,np.where(remaining_stocks==s))
    for s1 in remaining_stocks:
        a2 = df2[df2['stock_id'] == s1 ][column]
        if np.array_equal(a1,a2):
            print(s1)
            remaining_stocks = np.delete(remaining_stocks,np.where(remaining_stocks==s1))
            clusters[s].append(s1)
            print(remaining_stocks)

Could you please explain what is the error in this code?

I wrote the following code and seem to get more than the actual numbers of clusters in the dataframe.

0

There are 0 best solutions below