How to encode a taxonomy in Weaviate contextionary

267 Views Asked by At

I would like to create a semantic context for my data before vectorizing the actual data in Weaviate (https://github.com/semi-technologies/weaviate). Lets say we have a taxonomy where we have a set of domain specific concepts together with links to their related concepts. Could you advise me what the best way is to encode not only those concepts but also relations between them using contextionary?

1

There are 1 best solutions below

0
On

Depending on your use case, there are a few answers possible.

  1. You can create the "semantic context" in a Weaviate schema and use a vectorization module to vectorized the data according to this schema.
  2. You have domain-specific concepts in your data that the out-of-the-box vectorization modules don't know about (e.g., specific abbreviations).
  3. You want to capture the semantic context of (i.e., vectorize) the graph itself before adding it to Weaviate.

The first is the easiest and straightforward one, the last one is the most esoteric.

Create a schema and use a vectorizer for your data

In your case, you would create a schema based on your taxonomy and load the data using an out-of-the-box vectorizer (this configurator helps you to build a Docker-compose file).

I would recommend starting with this anyway, because it will determine your data model and how you can search through and/or classify data. It might even be the case that for your use case this step already solves the problem because the out-of-the-box vectorizers are (bias alert) pretty decent.

Domain-specific concepts

At the moment of writing, Weaviate has two vectorizers, the contextionary and the transformers modules.

If you want to extend Weaviate with custom context, you can extend the contextionary or fine tune and distribute custom transformers.

If you do this, I would highly recommend still taking the first step. Because it will simply improve the results.

Capture semantic context of your graph

I don't think this is what you want, but it possible and quite esoteric. In principle, you can store your vectorized graph in Weaviate, but you need to generate the vectors on your own. For example, at the moment of writing, we are looking at RDF2Vec.

PS:
Because people often ask about the role of ontologies and taxonomies in Weaviate, I've written this blog post.