Solr copyField mixed with RegexTransformer

2.1k Views Asked by At

Scenario:

In the database I have a field called Categories which of type string and contains a number of digits pipe delimited such as 1|8|90|130|

What I want:

In Solr index, I want to have 2 fields:

  • Field Categories_ pipe which would contain the exact string as in the DB i.e. 1|8|90|130|
  • Field Categories which would be a multi-valued field of type INT containing values 1, 8, 90 and 130

For the latter, in the entity specification I can use a regexTransformer then I specify the following field in data-config.xml: <field column="Categories" name="Navigation" splitBy="\|"/> and then specify the field as multi-valued in schema.xml

What I do not know is how can I 'copy' the same field twice and perform regex splitting only on one. I know there is the copyField facility that can be defined in schema.xml however I can't find a way to transform the copied field because from what I know (and I maybe wrong here), transformers are only available in the entity specification.

As a workaround I can also send the same field twice from the entity query but in reality, the field Categories is a computed field (selects nested) which is somewhat expensive so I would like to avoid it.

Any help is appreciated, thanks.

2

There are 2 best solutions below

2
On BEST ANSWER

Instead of splitting it at data-config.xml. You could do that in your schema.xml. Here is what you could do,

  1. Create a fieldType with tokenizer PatternTokenizerFactory that uses regex to split based on |.
  2. FieldSplit: Create a multivalued field using this new fieldType, will eventually have 1,8,90,130
  3. FieldOriginal: Create String field (if you need no analysis on that), that preserves original value 1|8|90|130|
  4. Now you can use copyField to copy FieldSplit , FieldOriginal values based on your need.

Check this Question, it is similar.

0
On

You can create two columns from the same data and treat them separately.

SELECT categories, categories as categories_pipe FROM category_table

Then you can split the "categories" column, but index the other one as-is.