How to do hybrid search on Redis using Langchain

194 Views Asked by At

I'm trying to pass filters to redis retriever to do hybrid search on my embeddings (vector + metadata filtering). The following doesn't work! It fails to pass the filters and filters would always be None:

retriever = redis.as_retriever(
            search_type="similarity_distance_threshold",
            search_kwargs="{'include_metadata': True,'distance_threshold': 0.8,'k': 5}",
            filter="(@launch:{false} @menu_text:(%%chicken%%))"
        )

I found another example and apparently filter expression should be pass as search_kwargs, but I can't figure out what should be the correct syntax. If I do it as follow:

retriever = redis.as_retriever(
            search_type="similarity_distance_threshold",
            "retriever_search_kwargs":"{'include_metadata': True,'distance_threshold': 0.8,'k': 5, 'filter': '@menu_text:(%%chicken%%) @lunch:{true}'}",
}

it generates this search query: similarity_search_by_vector > redis_query : (@content_vector:[VECTOR_RANGE $distance_threshold $vector] @menu_text:(%%chicken%%) @lunch:{true})=>{$yield_distance_as: distance}

and fails with the following error: redis.exceptions.ResponseError: Invalid attribute yield_distance_as

Any idea how to fix it? System Info: langchain 0.0.346 langchain-core 0.0.10

python 3.9.18

1

There are 1 best solutions below

0
On BEST ANSWER

It was a bug in Langchain! I found that '_prepare_range_query()' in langchain, is generating Redis query with wrong syntax. So I made the following small change which fixed the error for us:

def _prepare_range_query(
    self,
    k: int,
    filter: Optional[RedisFilterExpression] = None,
    return_fields: Optional[List[str]] = None,
) -> "Query":
    try:
        from redis.commands.search.query import Query
    except ImportError as e:
        raise ImportError(
            "Could not import redis python package. "
            "Please install it with `pip install redis`."
        ) from e

    return_fields = return_fields or []
    vector_key = self._schema.content_vector_key
    base_query = f"@{vector_key}:[VECTOR_RANGE $distance_threshold $vector]"

    if filter:
        # base_query = "(" + base_query + " " + str(filter) + ")"
        base_query = str(filter) + " " + base_query

    query_string = base_query + "=>{$yield_distance_as: distance}"

    return (
        Query(query_string)
        .return_fields(*return_fields)
        .sort_by("distance")
        .paging(0, k)
        .dialect(2)
    )