How to apply custom analyzers on a field in Vespa schema

26 Views Asked by agent47 At 26 March 2024 at 01:26

I have the following schema

schema product {
    document product {
        field brand_name type string {
            indexing: summary | index
        }
        field brand_name_tokens type array<string> {    #computed field
            ....
        }
}

I want to have a field called brand_name_tokens of type array<string> in the document which is derived from brand_name field as follows

split brand_name by white space and then remove substring [™®] if present.

I can do this processing before writing to Vespa. I would like to know if its possible to define this in the schema so that Vespa automatically computes this.

Original Q&A

There are 1 best solutions below

andreer On 26 March 2024 at 06:57 BEST ANSWER

you can do this - there's a "split by regex". There's no "remove substring" but if you don't mind splitting on ™/® then you can do it with a slightly fancier regex.

It looks a little different because you can't input unicode characters directly in the schema so you have to replace ™ with \xe2\x84\xa2:

field brand_name_tokens type array<string> {
    indexing: input brand_name | split "([. ®]|\xe2\x84\xa2)+" | summary
}

As this is a computed field, it should be defined outside the document product { block.

See https://docs.vespa.ai/en/reference/indexing-language-reference.html#split and the rest of the page for what you can do at indexing time. If you need to do more, you can write a custom Document Processor

How to apply custom analyzers on a field in Vespa schema

There are 1 best solutions below

Related Questions in SEARCH

Related Questions in VESPA

Trending Questions

Popular # Hahtags

Popular Questions