Assigning ID's to Identical Rows in a Binary Matrix

209 Views Asked by At

I have a 1200x130 0-1 matrix and some of the rows are identical (4 from one row, 8 from another etc.). I am trying to find an efficient way to assign the same ID's to the identical rows.

Here's what I've tried in Excel: For each row, I multiplied the kth column entry by 2^k and summed over all columns. Theoretically, for each different row, this should give me a unique ID but the numbers get very large Excel messes up the comparison for those numbers. (I guess it's because it stores them like 3.234023490249 x 10^246 up to a certain digit and ignores the remaining digits.)

So I've decided to use VBA but the only way I can think of is pairwise comparison for all rows and columns. I believe there should be more efficient ways to handle this. Any recommendations?

(I am not looking for an Excel or VBA specific solution. Any recommendation would help.)

Thanks in advance.

1

There are 1 best solutions below

0
On

You're converting binary to decimal. That's huge number indeed. A long integer in program language is a 64 bit number. So your 130 bit would overflow it as well.

You can use a combined ID, limiting each one to the maximum long integer value. For 130 bits, minimum of 3 long integers per row.

There would be 3 IDs for each row. Say, first ID takes the first 40 bits. Second ID takes the digits from 41 to 80. Third ID takes the digits from 81 to 130.

Comparson would require all 3 IDs to be identical to return a equal ID.


You can use strings as well.

Using hexadecimal (one character for each 4 binary digits), you would have 130/4 = 33 characters.

Maybe if you try some 32 base number, it would give 130/5 = 26 characters. Each 5 bits correspond to a character from 0, 1, 2, ..., 8, 9, A, B, ..., U, V.

Or just a plain 130 character string containing all your bits.