I was trying to understand how this stateful application is storing data in pvc, in general k8s statefulsets or myql stateful application store the same data in all the PVC/replicas in duplicate manner, meaning all the PVC's should have same data and if any pod goes down client get the requested data from other pod which is having the same data(from pvc).
i've installed qdrant with helm in AKS(4nodes, 4Replicas with 4PVC's(4 Azure SSD Disk's with default storage class:500G each)): We have pushed 1M collections with shard values and i can see only two pvc's got the data below:
k exec q181-qdrant-0 -- df -ah
Filesystem      Size  Used Avail Use% MountedON
/dev/sde        503G   14G  490G   3% /qdrant/storage
k exec q181-qdrant-1 -- df -ah
/dev/sdc        503G   24G  480G   5% /qdrant/storage
k exec q181-qdrant-2 -- df -ah
/dev/sdc        503G  5.4M  503G   1% /qdrant/storage
k exec q181-qdrant-3 -- df -ah
/dev/sdd        503G  9.1M  503G   1% /qdrant/storage
{~/ws} greetings, earthling [166.140Mb]$ ☞ kgp -o wide
NAME            READY   STATUS    RESTARTS   AGE     IP           NODE                                 NOMINATED NODE   READINESS GATES
q181-qdrant-0   1/1     Running   0          4h44m   10.3.6.155   aks-qdrant64gb-12987685-vmss000000   <none>           <none>
q181-qdrant-1   1/1     Running   0          4h49m   10.3.6.139   aks-qdrantnoaz-19721947-vmss000005   <none>           <none>
q181-qdrant-2   1/1     Running   0          4h48m   10.3.6.11    aks-qdrant64gb-12987685-vmss000004   <none>           <none>
q181-qdrant-3   1/1     Running   0          4h50m   10.3.6.20    aks-qdrant64gb-12987685-vmss000003   <none>           <none>
- If each pod/PV having different data, how it is going to handles the high availability when pod-3 goes down, and the client request for data in pod-3 in same time ?
- Is Shard is logical ? or it is physical partition in PV? i saw shards getting created through/along with collection with qdrant client through the rest-api calls or any python modules, but how it is going to associate to the pod(out of 4 pods)?
- if we setup HPA, our pods count would not be fixed, so shards created on a pod will be deleted when pod went down in non traffic hours, since it is pvc if not deleted but won't be accessed from other pods?
- We need to push 300M collection data, what is the best practice for high availability? Do we need to replicate the same data to other pvc/pods(that's how general statefulsets/stateful applications works) ?
GET /cluster #Result
{
  "result": {
    "status": "enabled",
    "peer_id": 3922056191064933,
    "peers": {
      "3922056191064933": {
        "uri": "http://q181-qdrant-0.q181-qdrant-headless:6335/"
      },
      "612800828958104": {
        "uri": "http://q181-qdrant-2.q181-qdrant-headless:6335/"
      },
      "5183492229046375": {
        "uri": "http://q181-qdrant-3.q181-qdrant-headless:6335/"
      },
      "1755630610601120": {
        "uri": "http://q181-qdrant-1.q181-qdrant-headless:6335/"
      }
    },
    "raft_info": {
      "term": 402,
      "commit": 871,
      "pending_operations": 0,
      "leader": 1755630610601120,
      "role": "Follower",
      "is_voter": true
    },
    "consensus_thread_status": {
      "consensus_thread_status": "working",
      "last_update": "2024-03-20T18:10:53.895767796Z"
    },
    "message_send_failures": {}
  },
  "status": "ok",
  "time": 0.0000052
}
trying to create shards in qdrant and understand how it is storing Statefulsets in kubernetes stores the multiple copy of same data in all the replicas but here it is not happening.
 
                        
You need to call
GET /collections/my-collection/clusterfor this info. You'll get something like:You can co-relate this with the results of
GET /clusterto find the shards for each node.For rest of your questions:
replication_factoras long as they have the same bootstrap URI.Read more at https://qdrant.tech/documentation/guides/distributed_deployment.