Below is part of config.yml generated by rasa init.
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
It shows CountVectorsFeaturizer is being specified twice. Why is it so?
In the first one, you simply take each token as a feature for the BoW, while 'char_wb' first creates character n-grams within the token's boundaries and then adds them to the BoW's feature set. The two pair well together. See doc rasa and sklearn.