Solr, how to define Nested Documents in the schema.xml

2.3k Views Asked by At

I have a document with a nested document and I want to define the schema to Solr. I have been reading the documentation but I don't know how to define the schema.xml with nested documents.

When I try to index a document with addBean I get an error because I don't have in the schema the field obj1 and I don't know how to define it.

I'm using java object with @Field annotations.

public class ObjToIndex {
    @Field
    String id;

    @Field
    String name;

    @Field
    ObjToIndex2 obj1;

public class ObjToIndex2 {
    @Field
    String id;
    @Field
    String lastName;

I don't know how to define in the schema a field obj1 with type "object" or something similar.

3

There are 3 best solutions below

0
On

I don't know how to define in the schema a field obj1 with type "object" or something similar.

You can't (at least not in the way you think it)

Solr is not designed in that way: the unit of information is a document that is composed by fields; fields may be of different types, but, in short, they are only primitive types (strings, numbers, booleans), fields cannot be complex objects. Take a look at How Solr Sees the World in the documentation.

Does it mean you can't manage nested documents? No. You can manage them with some caveats

How to define the schema

First of all you need to define the internal _root_ field like this:

<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />

Then you need to merge all "primitive" fields of your parent and children objects in a single list of fields. This has some counterparts that are also mentioned in the solr documentation:

  • you have to define an id field that must exist for both parent and children objects and you have to guarantee it is globally unique
  • only fields that exists in both parent and children objects can be declared as "required"

For example let's see a slightly more complex case where you can nest multiple comments to blog posts:

public class BlogPost {
@Field
String id;

@Field
String title;

@Field(child = true)
List<Comment> comments;
}

public class Comment {
@Field
String id;

@Field
String content;
}

Then you need a schema like this:

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="${solr.core.name}" version="1.5">
  <types>
    <fieldType name="string"  class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="long" class="solr.LongPointField" positionIncrementGap="0"/>

  <fields>   
    <field name="_version_" type="long" indexed="true" stored="true" />
    <field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
    <field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true" />
    <field name="title" type="string" indexed="true" stored="true" multiValued="false" required="false" />
    <field name="content" type="string" indexed="true" stored="true" multiValued="false" required="false" />
  </fields>
  <uniqueKey>id</uniqueKey>
</schema>

How to index documents

Using solrj it is pretty straightforward: simply create your nested objects in Java and the library will take care of creating the correct request when adding them

final BlogPost myPost = new BlogPost();
myPost.id = "P1";
myPost.title = "My post";
final Comment comment1 = new Comment();
comment1.id = "P1.C1";
comment1.content = "My first comment";
final Comment comment2 = new Comment();
comment2.id = "P1.C2";
comment2.content = "My second comment";
myPost.comments = List.of(comment1, comment2);
...
solrClient.addBean("my_core", myPost);

How to retrieve documents

This is a little bit tricky: to rebuild the original object and its children you have to use the child doc transformer in your request (query.addField("[child]")):

final SolrQuery query = new SolrQuery("*:*");
    query.addField("*");
    query.addField("[child]");
    try {
        final QueryResponse response = solrClient.query("my_core", query);
        final List<BlogPost> documents = response.getBeans(BlogPost.class);
0
On

I believe this is correct:

How to write nested schema.xml in solr?

Some of the logic of "why" is described here but the basic concept is that "child" documents are actually more "related" or "linked" documents within the same schema. They may include different fields, but effectively, they're just adding to the superset of fields in the overall schema.

1
On

in order to have nested object, please use the @Field(child = true)

public class SolrBeanWithNested{

@Field
private String id;

@Field(child = true)
private MyNestedOject nested;

}

Available since solr 5.1 See ticket : solr child