I trained fasttext embeddings and saved them as a .vec
file.
I want to use these for my spacy NER model. Is there a difference between
python -m spacy train en [new_model] [train_data] [dev_data] --pipeline ner --base-model embeddings.vec
and
python -m spacy train en [new_model] [train_data] [dev_data] --pipeline ner --vectors embeddings.vec
?
Both methods produce nearly identical training loss, F score, etc.
If you need to initialize a spacy model with vectors, use
spacy init-model
like this wherelg
is the language code:Once you have the vectors saved as part of a spacy model:
--vectors
loads the vectors from the provided model, so the initial model isspacy.blank("lg")
+ vectors--base-model
loads everything (tokenizer, pipeline components, vectors) from the provided model, so the initial model isspacy.load(model)
If the provided model doesn't have any pipeline components in it, the only potential difference is the tokenizer settings resulting from
spacy.blank("lg")
which can vary a little between individual spacy versions.