Add array values to numpy recarray where identifier matches with other array

150 Views Asked by At

I am trying to assign values from one recarray (in_arr) into another (out_arr) based on an identifier string in one of the columns. To assign the values correctly, the strings in id must match.

Some constraints:

  • the number of elements in in_arr can be smaller or larger than the number in out_arr
  • every identifier in in_arr is represented in out_arr, not necessarily the other way round
  • if the number in in_arr is larger, entries will repeat and any - single one - of these can be assigned
  • every identifier in out_arr is unique
  • the element order of the result does not matter
  • I'd rather not loop throug every element ;-)

Here is some code:

my_dtype = [('id', 'S3'), ('val', int)]
in_arr = np.array([('xyz', 1), ('abc', 2), ('abc', 2)], dtype=my_dtype)
out_arr = np.array([('abc', 0), ('asd', 0), ('qwe', 0), ('xyz', 0), ('def', 0)], dtype=my_dtype)

msk_in, msk_out = ... # some magic
out_arr[msk_out]['val'] = in_arr[msk_in]['val']    # <-- other ways to assign also work for me...

out_arr
array([(b'abc', 2), (b'asd', 0), (b'qwe', 0), (b'xyz', 1), (b'def', 0)],
      dtype=[('id', 'S3'), ('val', '<i8')])

The closest, I came for replacing my "magic part" is by borrowing from this question. But this only gives me the correct indices, not the correct order.

np.where(np.isin(out_arr['id'], in_arr['id']))[0]
array([0, 3])
1

There are 1 best solutions below

0
Nyps On BEST ANSWER

I solved my problem by sorting the arrays according to the id values. This requires, to remove duplicates in the in_arr first.

# remove duplicates
in_arr = in_arr[np.unique(in_arr['id'], return_index=True)[1]]

# sort both arrays to match the first len(in_arr) elements
out_arr = np.array(sorted(out_arr, key=lambda x: (np.isin(x['id'], in_arr['id']), x['id']), reverse=True))
in_arr[::-1].sort(order='id')

# assign values and test
out_arr['val'][:len(in_arr)] = in_arr['val'] 
out_arr
array([(b'xyz', 1), (b'abc', 2), (b'qwe', 0), (b'def', 0), (b'asd', 0)],
      dtype=[('id', 'S3'), ('val', '<i8')])