Why CountVectorsFeaturizer is used twice in config.yml produced by rasa init?

50 Views Asked by At

Below is part of config.yml generated by rasa init.

- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier

It shows CountVectorsFeaturizer is being specified twice. Why is it so?

1

There are 1 best solutions below

0
On

In the first one, you simply take each token as a feature for the BoW, while 'char_wb' first creates character n-grams within the token's boundaries and then adds them to the BoW's feature set. The two pair well together. See doc rasa and sklearn.