ElasticSearch painless filter script on text fields not working

1.5k Views Asked by At

I want to use an equality filter (exact match) using a painless script in ElasticSearch. I cannot use directly a term query because the check I want to do is on a text field (and not keyword), so I tried with a match_phrase. This is my mapping: I can't change it.

{
  "my_index": {
    "aliases": {},
    "mappings": {
      "properties": {
        "my_field": {
          "type": "text"
        },
      }
    },
    "settings": {
      "index": {
        "max_ngram_diff": "60",
        "number_of_shards": "8",
        "blocks": {
          "read_only_allow_delete": "false",
          "write": "false"
        },
        "analysis": {...}
      }
    }
  }
}

I tried this query, following this guide:

{
    "size": 10,
    "index": "my_index",
    "body": {
        "query": {
            "bool": {
                "should": [{
                    "match_phrase": {
                        "my_field": {
                            "query": "MY_VALUE",
                            "boost": 1.5,
                            "slop": 0
                        }
                    }
                }],
                "must": [],
                "filter": [{
                    "script": {
                        "script": {
                            "lang": "painless",
                            "source": "doc['my_field'] == 'MY_VALUE'"
                        }
                    }
                }],
                "minimum_should_match": 1
            }
        }
    }
}

Anyway, I got this error:

body:
{
  "error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "runtime error",
        "script_stack": [
          "org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:101)",
          "org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:53)",
          "doc['my_field'] === 'MY_VALUE'",
          "    ^---- HERE"
        ],
        "script": "doc['my_field'] === 'MY_VALUE'",
        "lang": "painless",
        "position": {
          "offset": 4,
          "start": 0,
          "end": 30
        }
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my_index",
        "node": "R99vOHeORlKsk9dnCzcMeA",
        "reason": {
          "type": "script_exception",
          "reason": "runtime error",
          "script_stack": [
            "org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:101)",
            "org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:53)",
            "doc['my_field'] === 'MY_VALUE'",
            "    ^---- HERE"
          ],
          "script": "doc['my_field'] === 'MY_VALUE'",
          "lang": "painless",
          "position": {
            "offset": 4,
            "start": 0,
            "end": 30
          },
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "No field found for [my_field] in mapping with types []"
          }
        }
      }
    ]
  },
  "status": 400
}

It seems that doc doesn't contain text fields (I tried with other non-text fields and it works!)

Here they say that:

Doc values are a columnar field value store, enabled by default on all fields except for analyzed text fields.

And here they say that:

text fields are searchable by default, but by default are not available for aggregations, sorting, or scripting. Set fielddata=true on your_field_name in order to load fielddata in memory by uninverting the inverted index.

But I can't change the mapping.

How I can access text fields in a painless filter script?

(This is similar to ElasticSearch exact match on text field with script but more specific on the filtering script)

1

There are 1 best solutions below

0
On

ScriptQuery only supports doc_values.

Doc values are the on-disk data structure, built at document index time, which makes this data access pattern possible. They store the same values as the _source but in a column-oriented fashion that is way more efficient for sorting and aggregations. Doc values are supported on almost all field types, with the notable exception of text and annotated_text fields.

As per discussion here https://github.com/elastic/elasticsearch/issues/30984

Accessing the _source field is slow and something that we don't want to expose in the ScriptQuery because it would be need to be accessed on every document making the search very inefficient.

So you will either need to add keyword sub-field in mapping and reindex data or enable fields data - which will consume large memory