I have the following schema
schema product {
document product {
field brand_name type string {
indexing: summary | index
}
field brand_name_tokens type array<string> { #computed field
....
}
}
I want to have a field called brand_name_tokens of type array<string> in the document which is derived from brand_name field as follows
- split
brand_nameby white space and then remove substring[™®]if present.
I can do this processing before writing to Vespa. I would like to know if its possible to define this in the schema so that Vespa automatically computes this.
you can do this - there's a "split by regex". There's no "remove substring" but if you don't mind splitting on ™/® then you can do it with a slightly fancier regex.
It looks a little different because you can't input unicode characters directly in the schema so you have to replace ™ with
\xe2\x84\xa2:As this is a computed field, it should be defined outside the
document product {block.See https://docs.vespa.ai/en/reference/indexing-language-reference.html#split and the rest of the page for what you can do at indexing time. If you need to do more, you can write a custom Document Processor