how to configure Qdrant data persistence and reload

2.6k Views Asked by At

I'm trying to build an app with streamlit that uses Qdrant python client.

to run the qdrant, im just using:

docker run -p 6333:6333 qdrant/qdrant

I have wrapped the client in something like this:

class Vector_DB:
    def __init__(self) -> None:
        self.collection_name = "__TEST__"
        self.client = QdrantClient("localhost", port=6333,path = "/home/Desktop/qdrant/qdrant.db")

but i'm getting this error:

Storage folder /home/Desktop/qdrant/qdrant.db is already accessed by another instance of Qdrant client. If you require concurrent access, use Qdrant server instead.

I suspect that streamlit is creating multiple instances of this class, but, if i try to load the db from one snapshot, like:

    class Vector_DB:
        def __init__(self) -> None:
             self.client = QdrantClient("localhost", port=6333)
             self.client.recover_snapshot(collection_name = "__TEST__",location = "http://localhost:6333/collections/__TEST__/snapshots/__TEST__-8742423504815750-2023-10-30-12-04-14.snapshot")

it works. Seems like i'm missing something important on how to configure it. What is the properly way of setting Qdrant, to store some embeddings, turn off the machine, and reload it?

3

There are 3 best solutions below

0
On BEST ANSWER

You mention using the Qdrant server, to which you'd like to connect with the Python client.

There are two problems in your above question, let me go over both of them:

1. Persist data in Qdrant server:
A Qdrant server stores its data inside the Docker container. Docker containers are immutable however, which means that they don't hold data across restarts. To persist data you must specify a mount. Qdrant will then persist data on the mount instead of in the immutable container. You could configure a mount using the -v flag like this1:

docker run -p 6333:6333 \
    -v $(pwd)/qdrant_storage:/qdrant/storage:z \
    qdrant/qdrant

Data is automatically persisted and reloaded when you stop or restart the Qdrant container. You don't have to take extra measures for this.

2. Qdrant server versus local mode:
Qdrant supports two operating modes. The Qdrant server and local mode. You're using the Qdrant server through Docker. The Python client also supports local mode which is an in-memory implementation intended for testing.

To use a Qdrant server you must specify its location (URL)2. You've already specified "localhost", perfect if hosting the Qdrant server on your local machine.

To use local mode you can either specify ":memory:" or provide a path to persist data3.

Right now you've specified parameters for both. Instead you must stick with one. You can update your client initialization to this:

class Vector_DB:
    def __init__(self) -> None:
        self.collection_name = "__TEST__"
        self.client = QdrantClient("localhost", port=6333)
1
On

When you specify the path value, the qdrant_client library provisions a local instance of Qdrant that doesn't support concurrent access and is for testing only. The path refers to the path where the files of the local instance will be saved.

In the second case, you didn't specify the path value, which connects the client to the Docker instance that supports concurrent access and persistence as you'd expect. You can find the docs of the options here (It's a work in progress though). https://python-client.qdrant.tech/qdrant_client.qdrant_client

0
On

You can use the following docker compose file to run a Qdrant DB instance on your local machine. See the usage of a volume for persistent storage. Running dockerfile commands with volume options everytime can be cumbersome. Use Docker Compose instead.

services:
  my-app:
    build: 
      context: .
    depends_on:
      - qdrant
    networks:
      - q_network
  qdrant:
    image: qdrant/qdrant:latest
    restart: always
    container_name: qdrant
    ports:
      - 6333:6333
      - 6334:6334
    expose:
      - 6333
      - 6334
      - 6335
    configs:
      - source: qdrant_config
        target: /qdrant/config/production.yaml
    volumes:
      - qdrant_storage:/qdrant_data
    networks:
      - q_network

configs:
  qdrant_config:
    content: |
      log_level: INFO  

volumes:
  qdrant_storage:

networks:
  q_network: