Optimizing Compare of 2 mysql JSON columns

58 Views Asked by At

This is really a part 2 of this question. I have two tables, each with a JSON column.

However the tables have over a million records, and the JSON objects are quite big.

I am performing a comparison to see which objects are in the second table, which either aren't in the first, or are something in the JSON changed. The two tables have a key ID column, and the JSON column. So new key IDs appearing in the second table, are obviously new records. But if the key ID is in both tables, then I need to know if the object has changed.

Normally I perform this sort of Delta compare using a checksum of the fields in table, and compare to the same records of the other table, and the differences are quickly found by simple compare of the checksum.

However, because these are JSON objects, I cannot create a checksum, since the order of the keys is not guaranteed to be consistant within the object... But the objects should be considered equal. So I am really just comparing 2 objects directly. This seems to take far too long.

Are there better ways to compare 2 JSON columns? (Other than ensuring the keys are in the same order on both table's JSONs, and then converting to text and creating a checksum..)

I utilize JSON_OBJECT to build my objects.. But I suppose its not really about how the JSON objects are built, since they are stored in the JSON datatype - its more about a consistant way they will be extracted to compare against. So I guess I should also mention that table 1 gets created , and it might be months before table 2 is.. so I suppose the worry on creating checksums is that the mySQL method of extracting the JSON may change in between those times, and cause the process to suddenly think all the records have changed...(assuming table 1 is created with checksums in it, rather than holding the JSON)

I suppose I could just leave the checksums to happen at the same time, and therefore mySQL can change their ordering anytime, and I should be fine, assuming whatever they do, it is consistent with each call.

But let me know if there are other methods out there that are slick

0

There are 0 best solutions below