For such input data, I want to obtain a result in which I will have only those values that for the same id1 and id2 have the value of Reported and Primary.
var items = [
{
"id1": 241,
"id2": 716,
"type": "Primary"
},
{
"id1": 241,
"id2": 716,
"type": "Reported"
},
{
"id1": 477,
"id2": 850,
"type": "Reported"
},
{
"id1": 563,
"id2": 340,
"type": "Primary"
},
{
"id1": 649,
"id2": 322,
"type": "Reported"
}];
I try to use exists join like below:
const op = require('/MarkLogic/optic');
var reportedItems = op.fromLiterals(items)
.where(op.in(op.col('type'), 'Reported'))
.select(['id1', 'id2'], 'reported');
var primaryItems = op.fromLiterals(items)
.where(op.in(op.col('type'), 'Primary'))
.select(['id1', 'id2'], 'primary')
primaryItems
.existsJoin(reportedItems, [
op.on(op.viewCol('primary', 'id1'),
op.viewCol('reported', 'id1')),
op.on(op.viewCol('primary', 'id2'),
op.viewCol('reported', 'id2'))
])
.select("id1")
.result()
.toArray()
However, when the number of items in the input data exceeds 5000, the execution time exceeds 2 seconds. If, however, I perform a join only on one property (e.g. id1), the result will almost immediately appear (for 5k objects 0.3s).
Why is it so slow, can I improve it in some way or use another method to obtain the same result?
I expect this should give the same result, and in my testing it seems to be much faster:
What I think is happening in the original query is that the
existsJoin
is performing a cross product join ofprimary
andreported
, and using that result to filterprimary
. Because the cross product is so large, the filter takes a long time to execute.It might be possible to omit
whereDistinct
if you know you don't have any duplicate rows - my data was randomly generated for testing so I had to use it.