Cosine similarity return empty

Question

Cosine similarity return empty

134 Views Asked by Danilo Toro At 20 August 2023 at 00:07

I am trying to access the most similar vectors but it returns empty and I don't understand.

I am following this documentation: https://redis-py.readthedocs.io/en/stable/examples/search_vector_similarity_examples.html

And this is my schema:

schema = (
                TagField("ticket_url"),
                NumericField("ticket_id"),
                NumericField("entity_id"),
                VectorField("embedding",
                            "HNSW", {
                                "TYPE": "FLOAT32",
                                "DIM": self.vector_dim,
                                "DISTANCE_METRIC": "COSINE",
                            }
                            ),
            )
            definition = IndexDefinition(
                prefix=[self.doc_prefix], index_type=IndexType.HASH)
            self.r.ft(self.index_name).create_index(
                fields=schema, definition=definition)

The function to search similar vectors

def search_similar_documents(self, entity_id, vector, topK=5, ticket_id=None):
        query = (
            Query("*=>[KNN 2 @embedding $vec as score]")
            .sort_by("score")
            .return_fields("score")
            .paging(0, 2)
            .dialect(2)
        )

        query_params = {"vec": vector}
        return self.r.ft(self.index_name).search(query, query_params).docs

Vectors are generated from an openai response and converted to bytes

def embedding_openai(self, text):
        try:
            response = openai.Embedding.create(
                input=text,
                model="text-embedding-ada-002"
            )
            embedding = response['data'][0]['embedding']
            array_embedding = np.array(embedding, dtype=np.float32)
            return array_embedding.tobytes()
        except Exception as ex:
            print(ex)
            return None

And redis.ft(index).info() return this

{'index_name': 'conversations', 'index_options': [], 'index_definition': [b'key_type', b'HASH', b'prefixes', [b'tickets:'], b'default_score', b'1'], 'attributes': [[b'identifier', b'ticket_url', b'attribute', b'ticket_url', b'type', b'TAG', b'SEPARATOR', b','], [b'identifier', b'ticket_id', b'attribute', b'ticket_id', b'type', b'NUMERIC'], [b'identifier', b'entity_id', b'attribute', b'entity_id', b'type', b'NUMERIC'], [b'identifier', b'embedding', b'attribute', b'embedding', b'type', b'VECTOR']], 'num_docs': '973', 'max_doc_id': '973', 'num_terms': '0', 'num_records': '3892', 'inverted_sz_mb': '0.00634765625', 'vector_index_sz_mb': '6.00555419921875', 'total_inverted_index_blocks': '2999', 'offset_vectors_sz_mb': '0', 'doc_table_size_mb': '0.086483001708984375', 'sortable_values_size_mb': '0', 'key_table_size_mb': '0.030145645141601562', 'records_per_doc_avg': '4', 'bytes_per_record_avg': '1.7101746797561646', 'offsets_per_term_avg': '0', 'offset_bits_per_record_avg': '-nan', 'hash_indexing_failures': '0', 'total_indexing_time': '347.62900000000002', 'indexing': '0', 'percent_indexed': '1', 'number_of_uses': 1, 'gc_stats': [b'bytes_collected', b'0', b'total_ms_run', b'0', b'total_cycles', b'0', b'average_cycle_time_ms', b'-nan', b'last_run_time_ms', b'0', b'gc_numeric_trees_missed', b'0', b'gc_blocks_denied', b'0'], 'cursor_stats': [b'global_idle', 0, b'global_total', 0, b'index_capacity', 128, b'index_total', 0], 'dialect_stats': [b'dialect_1', 0, b'dialect_2', 0, b'dialect_3', 0]}

the vectors are stored as bytes, I don't know if it's the algorithm or I'm the problem :/

Original Q&A

There are 1 best solutions below

**Spartee** · Answer 1 · 2023-11-09T22:55:34.360000

Was this resolved? A few things it could be

If you have alot of documents and you're using the FLAT index, it could be that the search simply isn't returning in the alloted 500ms timeout. This can be configured on startup or just use HNSW.

https://redisvl.com has a few examples of this in the user guide.

Cosine similarity return empty

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in REDIS

Related Questions in EMBEDDING

Related Questions in REDIS-PY

Trending Questions

Popular # Hahtags

Popular Questions