how to return all documents that has a subset of input array in OpenSearch

1.5k Views Asked by At

I have a document with given structure:

{ "name" : "WF1", "myIndex" : [3, 4, 5] }

Lets say I have 4 of such records -

{ "name" : "WF1", "myIndex" : [3, 4, 5] }
{ "name" : "WF2", "myIndex" : [6, 7, 8] }
{ "name" : "WF3", "myIndex" : [9, 10, 11] }
{ "name" : "WF4", "myIndex" : [3, 6, 9] }

If I fire below "term" query:

GET myIndex/_search
{
  "query": {
    "terms": {
      "qualsIndex": [
        3, 6, 9, 20
      ]
    }
  }
}

It returns all 4 records. Whereas I only want to return a record that has 3,6, 9 i.e. only WF4. Basically, I want a result document that has a subset of input passed.

Note: I can tweak my document structure to achieve this. is it possible in OpenSearch?

2

There are 2 best solutions below

2
On

TLDR;

To the best of my knowledge there is no solution in both ElasticSearch and OpenSearch. But I think you can hack you way through it, using number as words

The Hack

Indexing the document with the field myIndex as a string of numbers. I can later search for those numbers, using the match query and the parameter such as minimum_should_match.

DELETE 72004393

POST _bulk
{"index":{"_index":"72004393"}}
{"name":"WF1","myIndex":"3 4 5"}
{"index":{"_index":"72004393"}}
{"name":"WF2","myIndex":"6 7 8"}
{"index":{"_index":"72004393"}}
{"name":"WF3","myIndex":"9 10 11"}
{"index":{"_index":"72004393"}}
{"name":"WF4","myIndex":"3 6 9"}


GET /72004393/_search
{
  "query": {
    "match": {
      "myIndex": {
        "query": "3 6 9 20",
        "minimum_should_match": 3
        }
    }
  }
}

Will give you something like that:

{
  ...
  "hits" : {
    ...
    "max_score" : 2.0794413,
    "hits" : [
      {
        "_index" : "72004393",
        "_id" : "xaMuYoABOgujegeQJgZr",
        "_score" : 2.0794413,
        "_source" : {
          "name" : "WF4",
          "myIndex" : "3 6 9"
        }
      }
    ]
  }
}

This is not perfect and may lead to some edge cases, but this is the closest I could get to a "solution".

4
On

Tldr;

You can achieve this with terms set query.

To understand

Example mapping:

{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "myIndex": {
        "type": "keyword"
      },
      "required_matches": {
        "type": "long"
      }
    }
  }
}

Example query:

{
  "query": {
    "terms_set": {
      "myIndex": {
        "terms": [3, 6, 9, 20],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}

In your case, required_matches should be index as number of items of myIndex array.

To reproduce

Here is a fully working example:

First of all the setup:

DELETE /72004393/

PUT /72004393/
{
  "mappings": {
    "properties": {
      "myIndex": {
        "type": "keyword"
      },
      "name": {
        "type": "text"
      }
    }
  }
}

POST _bulk
{"index":{"_index":"72004393"}}
{"name":"WF1","myIndex":["3","4","5"]}
{"index":{"_index":"72004393"}}
{"name":"WF2","myIndex":["6","7","8"]}
{"index":{"_index":"72004393"}}
{"name":"WF3","myIndex":["9","10","11"]}
{"index":{"_index":"72004393"}}
{"name":"WF4","myIndex":["3","6","9"]}

The query:

GET /72004393/_search
{
  "query": {
    "terms_set": {
      "myIndex": {
        "terms": [ "3", "6", "9", "20" ],
        "minimum_should_match_script": {
          "source": "3"
        }
      }
    }
  }
}

Gives:

{
  ...
  "hits" : {
  ...
    "max_score" : 2.859232,
    "hits" : [
      {
        "_index" : "72004393",
        "_id" : "LqOQZ4ABOgujegeQ2gfV",
        "_score" : 2.859232,
        "_source" : {
          "name" : "WF4",
          "myIndex" : ["3","6","9"]
        }
      }
    ]
  }
}

Another example:

PUT test
{
  "mappings": {
    "properties": {
      "myIndex": {
        "type": "keyword"
      },
      "name": {
        "type": "text"
      },
      "required_matches": {
        "type": "long"
      }
    }
  }
}
POST _bulk
{"index":{"_index":"test"}}
{"name":"WF1","myIndex":["3","4","5"], "required_matches": 3} 
{"index":{"_index":"test"}}
{"name":"WF2","myIndex":["6","7","8"], "required_matches": 3}
{"index":{"_index":"test"}}
{"name":"WF3","myIndex":["9","10","11"], "required_matches": 3}
{"index":{"_index":"test"}}
{"name":"WF4","myIndex":["3","6","9"], "required_matches": 3}
{"index":{"_index":"test"}}
{"name":"WF5","myIndex":["3","6","9", "15", "20"], "required_matches": 5}
{"index":{"_index":"test"}}
{"name":"WF6","myIndex":["3","6","9", "15"], "required_matches": 4}

Query:

GET test/_search
{
  "query": {
    "terms_set": {
      "myIndex": {
        "terms": ["3", "6", "9"],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}

The query above will match only WF4 because its required_match is 3 and WF5 AND WF6 required matches are 4 and 5.

If you update terms in the query to ["3", "6", "9", "15"] it will match both WF4 AND WF5 and if you update it to ["3", "6", "9", "15", "20"] it will match to WF4, WF5, WF6.