I am given this sample array as follows:
[[1, 0, 1, 0], [1, 0, 1, 0], [1, 0, 1, 0], [1, 1, 1, 1], [1, 1, 1, 1], [0, 1, 1, 0], [0, 1, 1, 0], [1, 0, 0, 1]]
I want to create all possible largest continuous vector space that is closed under the sum operation of bitwise_and, and that sum is some user defined constant (i.e. np.bitwise_and(x1, x2).sum() >= constant).
For example, For the above array, I can have three possible vector spaces as follows:
V1: [[1, 0, 1, 0], [1, 0, 1, 0], [1, 0, 1, 0], [1, 1, 1, 1], [1, 1, 1, 1]]
V2: [[1, 1, 1, 1], [1, 1, 1, 1], [0, 1, 1, 0], [0, 1, 1, 0]]
V3: [[1, 0, 0, 1]]
If you take the first vector space, for any two vectors the sum of their bitwise_and >= 2 (i.e. some defined constant). Similarly for second, and third the sum of their bitwise_and >= 2.
I tried several ways, but cant reach an efficient solution. Any tips or suggestion will be helpful.
I tried two different ways:
First Approach: Using Regular Expressions as follows:
def match_strings(strings, pattern, indices):
matched_strings = [(idx, string) for idx, string in zip(indices, strings) if re.match(pattern, string)]
return matched_strings
def identify_user_distribution(availability_matrix, indices):
string_rep = []
for instance in availability_matrix:
temp = ''
for item in instance:
temp += str(item)
string_rep.append(temp)
patterns = set(string_rep)
patterns = [instance.replace('0', "[01]+") for instance in patterns]
clusters = []
for instance_pattern in patterns:
cluster = match_strings(string_rep, instance_pattern, indices)
clusters.append([cluster[0][0], cluster[-1][0]])
return clusters
With this approach, the problem gets solved, but the issue it is not closed. For example, If I have a pattern 1***, then in the vector space i am getting all the vectors that are satisfying the property, but I am also getting some that are not. like: 1000, 1101, 1111 are all in the same group.
So I tried the numpy approach something like below:
def bitwise_and_groups_with_sum(vectors, target_sum):
groups = {}
index = {}
flag = False
for idx, vector in enumerate(vectors):
temp = ''
for item in vector:
temp += str(item)
if len(groups) == 0:
groups[temp] = [vector]
index[temp] = [idx]
else:
for key, value in groups.items():
if vector in value:
index[key].append(idx)
continue
elif np.all([np.bitwise_and(vector, instance).sum() == target_sum for instance in value]):
groups[key].append(vector)
index[key].append(idx)
flag = True
if not flag:
groups[temp] = [vector]
index[temp] = [idx]
flag = False
return index
But I am not able to formulate the right logic to withstand the closure property.