Having a combination of pre trained and supervised embeddings in rasa nlu pipeline

429 Views Asked by At

I am new to rasa and started creating a very domain-specific chatbot. As part of it, I understand its better to use supervised embeddings as part of nlu pipeline, since my use case is domain-specific.

I have an example intent in my nlu.md

## create_system_and_config
- create a [VM](system) of [12 GB](config) 

If I try to use a supervised featurizer, it might be working fine with my domain-specific entities, but my concern here is, by using only supervised learning, won't we lose the advantage of pre-trained models? For example, in a query such as add a (some_system) of (some_config). add and create are very closely related. pre-trained models will be able to pick such verbs easily. Is it possible to have a combination of pre-trained model and then do some supervised learning on top of it in our nlu pipeline, something like transfer learning?

1

There are 1 best solutions below

2
On

If you're creating domain-specific chatbot, it's always better to use supervised embedding instead of pre-trained

For example, in general English, the word “balance” is closely related to “symmetry”, but very different to the word “cash”. In a banking domain, “balance” and “cash” are closely related and you’d like your model to capture that.

In your case also

your model needs to capture that words VM and Virtual Machine are same. Pretrained featurizers are not trained to capture this and they are more generic.

The advantage of using pre-trained word embeddings in your pipeline is that if you have a training example like: “I want to buy apples”, and Rasa is asked to predict the intent for “get pears”, your model already knows that the words “apples” and “pears” are very similar. This is especially useful if you don’t have enough training data

For more details you can refer Rasa document