I have the following data :
X = [[ 0.],[ 0.],[ 0.],[ 0.],[ 5.25799992],[10.51700001],[15.74699956],[21.03599973],[26.41500018]]
y = [181.42686706, 144.47493065, 143.93277864, 143.93277864, 166.07783771, 127.06519488, 80.16842458, 58.30687141, 48.83896311]
def no_similar_times(X: np.array, y: np.array) -> bool:
#returns True if no duplicate in X else False
print(X)
print(len(np.unique(X.round(1))) == len(X))
print("")
return len(np.unique(X.round(1))) == len(X)
def get_inliers() -> np.array:
# predictor is 2d polynomial
ransac = RANSACRegressor(
estimator=make_pipeline(PolynomialFeatures(3), LinearRegression()),
min_samples=0.4,
is_data_valid=no_similar_times,
)
ransac.fit(X, y)
inlier_mask = ransac.inlier_mask_
print("Inliers")
no_similar_times(X[inlier_mask], y[inlier_mask])
return ransac, inlier_mask
if __name__ == "__main__":
get_inliers()
When running this code, I obtain an inlier_mask that corresponds to invalid data (meaning that no_similar_times(X[inlier_mask], y[inlier_mask]) returns False. It should not be the case since a set of inliers should necessarily be valid in the RANSAC routine not to be skipped.
When printing I obtain :
[[ 0. ]
[10.51700001]
[ 0. ]
[ 0. ]]
False
[[21.03599973]
[15.74699956]
[ 0. ]
[ 5.25799992]]
True
[[ 0. ]
[21.03599973]
[10.51700001]
[26.41500018]]
True
[[26.41500018]
[ 0. ]
[10.51700001]
[ 5.25799992]]
True
[[ 0. ]
[ 0. ]
[ 0. ]
[10.51700001]]
False
Inliers
[[ 0. ]
[ 0. ]
[ 0. ]
[ 5.25799992]
[10.51700001]
[15.74699956]
[21.03599973]
[26.41500018]]
False
Meaning that no_similar_times is working as expected but that the output inlier mask is not one of the valid subset that was generated during the fitting process.
Can someone explain what happens?