I am writing a java program that compares two datasets, each set contains data of the same type. The datatypes are basically classes, containing both Strings, ints and a String[]. Let's call this class Foo
and the datasets a
and b
. For each item in a
, I need to find the item in b
that matches it most closely.
My problem is speed - I have outlined below, in pseudo-code, what I do right now. As you can imagine, it doesn't scale very well with increasing size (and I DO have much increasing sizes...). If anyone could point me in the direction of a better solution, I would greatly appreciate it. I am aware that sorting the arrays, in case of e.g. String or int comparisons, would increase speed vastly, but since my datatype is more complex, I don't see how that could work here.
Foo[] a = new Foo[...];
Foo[] b = new Foo[...];
for (item_a : a) {
double bestMatch = 0;
for (item_b : b) {
double match = compareFoo(item_a,item_b);
if (match > bestMatch) {
bestMatch = match;
}
}
//Do stuff with bestMatch - display, save etc.
}
private double compareFoo(Foo item_a, Foo item_b) {
//Compare every element of a and b,
//return value between 0 (no match) and 1 (identical)
}