Qdrant:vectordb:Which shard is at which node? It seems like all shards are on the two nodes out of 4Replicas

57 Views Asked by At

I was trying to understand how this stateful application is storing data in pvc, in general k8s statefulsets or myql stateful application store the same data in all the PVC/replicas in duplicate manner, meaning all the PVC's should have same data and if any pod goes down client get the requested data from other pod which is having the same data(from pvc).

i've installed qdrant with helm in AKS(4nodes, 4Replicas with 4PVC's(4 Azure SSD Disk's with default storage class:500G each)): We have pushed 1M collections with shard values and i can see only two pvc's got the data below:

k exec q181-qdrant-0 -- df -ah
Filesystem      Size  Used Avail Use% MountedON
/dev/sde        503G   14G  490G   3% /qdrant/storage
k exec q181-qdrant-1 -- df -ah
/dev/sdc        503G   24G  480G   5% /qdrant/storage
k exec q181-qdrant-2 -- df -ah
/dev/sdc        503G  5.4M  503G   1% /qdrant/storage
k exec q181-qdrant-3 -- df -ah
/dev/sdd        503G  9.1M  503G   1% /qdrant/storage

Pod's list

{~/ws} greetings, earthling [166.140Mb]$ ☞ kgp -o wide
NAME            READY   STATUS    RESTARTS   AGE     IP           NODE                                 NOMINATED NODE   READINESS GATES
q181-qdrant-0   1/1     Running   0          4h44m   10.3.6.155   aks-qdrant64gb-12987685-vmss000000   <none>           <none>
q181-qdrant-1   1/1     Running   0          4h49m   10.3.6.139   aks-qdrantnoaz-19721947-vmss000005   <none>           <none>
q181-qdrant-2   1/1     Running   0          4h48m   10.3.6.11    aks-qdrant64gb-12987685-vmss000004   <none>           <none>
q181-qdrant-3   1/1     Running   0          4h50m   10.3.6.20    aks-qdrant64gb-12987685-vmss000003   <none>           <none>
  1. If each pod/PV having different data, how it is going to handles the high availability when pod-3 goes down, and the client request for data in pod-3 in same time ?
  2. Is Shard is logical ? or it is physical partition in PV? i saw shards getting created through/along with collection with qdrant client through the rest-api calls or any python modules, but how it is going to associate to the pod(out of 4 pods)?
  3. if we setup HPA, our pods count would not be fixed, so shards created on a pod will be deleted when pod went down in non traffic hours, since it is pvc if not deleted but won't be accessed from other pods?
  4. We need to push 300M collection data, what is the best practice for high availability? Do we need to replicate the same data to other pvc/pods(that's how general statefulsets/stateful applications works) ?

GET /cluster #Result

{
  "result": {
    "status": "enabled",
    "peer_id": 3922056191064933,
    "peers": {
      "3922056191064933": {
        "uri": "http://q181-qdrant-0.q181-qdrant-headless:6335/"
      },
      "612800828958104": {
        "uri": "http://q181-qdrant-2.q181-qdrant-headless:6335/"
      },
      "5183492229046375": {
        "uri": "http://q181-qdrant-3.q181-qdrant-headless:6335/"
      },
      "1755630610601120": {
        "uri": "http://q181-qdrant-1.q181-qdrant-headless:6335/"
      }
    },
    "raft_info": {
      "term": 402,
      "commit": 871,
      "pending_operations": 0,
      "leader": 1755630610601120,
      "role": "Follower",
      "is_voter": true
    },
    "consensus_thread_status": {
      "consensus_thread_status": "working",
      "last_update": "2024-03-20T18:10:53.895767796Z"
    },
    "message_send_failures": {}
  },
  "status": "ok",
  "time": 0.0000052
}

trying to create shards in qdrant and understand how it is storing Statefulsets in kubernetes stores the multiple copy of same data in all the replicas but here it is not happening.

1

There are 1 best solutions below

0
On

which shard is at which node?

You need to call GET /collections/my-collection/cluster for this info. You'll get something like:

{
  "result": {
    "peer_id": 6534799014422152,
    "shard_count": 3,
    "local_shards": [
      {
        "shard_id": 0,
        "points_count": 62223,
        "state": "Active"
      },
      {
        "shard_id": 1,
        "points_count": 71779,
        "state": "Active"
      }
    ],
    "remote_shards": [
      {
        "shard_id": 0,
        "peer_id": 2455022185625782,
        "state": "Active"
      },
      {
        "shard_id": 1,
        "peer_id": 1073561671703112,
        "state": "Active"
      }
    ],
    "shard_transfers": []
  },
  "status": "ok",
  "time": 2.1128e-05
}

You can co-relate this with the results of GET /cluster to find the shards for each node.

For rest of your questions:

  1. You can configure replication_factor for your collection so you have your shards replicated across at least 2 nodes. Otherwise, search requests can be rejected by the cluster if shard has no active nodes.
  2. Shards are physical partitions in Qdrant. You can associate them to pods/PVs using the approach I described above.
  3. Not sure what's your question here. But if you're downscaling, be careful to move out the shards out of the node.
  4. You don't necessarily need more replicas if your dataset has 300M points. But if you have enough traffic, it makes sense to have replicas. Qdrant can take care of this replica part when you set replication_factor as long as they have the same bootstrap URI.

Read more at https://qdrant.tech/documentation/guides/distributed_deployment.