How to get list of stored tokens created by analyzer in solr 6.6.0

1.1k Views Asked by At

I am uploading documents for indexing in solr it is working perfectly and with the help of luke i can get all index terms created by solr.

My requirement is to get list of tokens created by analyzer. like if i pass "This is Simple HTML Document" then tokenizer will create tokens something like this:

[simple][html][document]. I want this list for my indexed documents.

How can i get this.

Thanks

3

There are 3 best solutions below

0
On

There are different ways to achieve this :

1) If you have enabled the Term Vector for the field of interest, you could use the term vector component.

2) You can explore the schema browser functionality and see the indexed tokens

3) you can use luke to explore the indexed tokens per document/field

4) you can use the Analysis tool to run analysis on the fly

1
On

You could try to use

The Term Vector Component (TVC) is a SearchComponent designed to return information about documents that is stored when setting the termVector attribute on a field:

<field name="features" type="text" indexed="true" stored="true" multiValued="true" termVectors="true" termPositions="true" termOffsets="true"/>

Changes required in solrconfig.xml

You need to enable the TermVectorComponent in your solr configuration (this is already in the example solrconfig.xml):

<searchComponent name="tvComponent" class="org.apache.solr.handler.component.TermVectorComponent"/>

A RequestHandler configuration using this component could look like this:

<requestHandler name="tvrh" class="org.apache.solr.handler.component.SearchHandler">
        <lst name="defaults">
                <bool name="tv">true</bool>
        </lst>
        <arr name="last-components">
                <str>tvComponent</str>
        </arr>
</requestHandler>

More information : https://wiki.apache.org/solr/TermVectorComponent

0
On

you can get that info in the Analysis tab of Solr Admin page