My application is data analysis for Physical Unclonable Functions (PUFs). More specifically, I am working with weak N-bit PUF arrays meaning each array produces a single N-bit key every time it is evaluated. There are P PUF arrays. I want to examine the two desired properties of uniqueness and repeatability by calculating the inter-PUF and intra-PUF Hamming distances respectively.
The PUF keys are read M times for each array at X different environmental conditions producing a data set of P*X NumPy arrays of shape (N,M). I am aware of the curse of dimensionality which suggests to me that I need to calculate the Hamming distance for every pair of keys. This would require (P * X * M) choose 2 operations. For example, if I had 1000 arrays each read 1000 times at 12 operating conditions that would produce 12E6 N-bit PUF values and there would be 12E6 choose 2 combinations to calculate which is over 7E13 operations.
Can the problem be simplified by knowledge that the intra-PUF Hamming distances are very small (~<10%) compared to the inter-PUF Hamming distances (~50%)? I don't need to check every possible inter-PUF distance. But I am still facing (X * M) choose 2 (eg: ~72E6) combinations for each array.
Ultimately I only need to know the maximum intra-PUF Hamming distance for each array between the X different environmental conditions. Is there anything I can exploit, like if I calculate the maximally spaced pair within a single array at a single environmental condition which would be the maximum of M choose 2 Hamming Distances, can I check only this pair against the readouts at different environmental conditions?
Maybe what I am asking is whether there are any properties of locally clustered data in high dimensionality that I can exploit?