Google Generative AI text embedding: What `task_type` would be more appropriate for outlier detection use case?

67 Views Asked by At

I am using the Google Generative AI new api. I have a corpus of document and I want to find some outliers. When creating the embeddings what is the most appropriate task_type to use? My code resemble the following:

model = TextEmbeddingModel.from_pretrained("textembedding-gecko-multilingual@001")
text_input = TextEmbeddingInput(text=text, task_type='CLUSTERING')
embeddings = model.get_embeddings([text_input])

The available tasks are :

  • RETRIEVAL_QUERY: Specifies the given text is a query in a search or retrieval setting.
  • RETRIEVAL_DOCUMENT: Specifies the given text is a document in a search or retrieval setting.
  • SEMANTIC_SIMILARITY: Specifies the given text will be used for Semantic Textual Similarity (STS).
  • CLASSIFICATION: Specifies that the embeddings will be used for classification.
  • CLUSTERING: Specifies that the embeddings will be used for clustering.

Not sure what is the most appropriate for an anomaly detection use case.

0

There are 0 best solutions below