I've recently been learning about Kubernetes operators. When deploying an etcd cluster, the etcd-operator uses the following creation method:
- Bootstrap phase: Start a seed member. In the corresponding startup parameters, the option
--initial-cluster-state
will be set tonew
. - Scale-out phase: gradually create new etcd nodes and join them one by one to the cluster where the seed member is, until the cluster's replicas meet the size requirement. During this process, the option
--initial-cluster-state
will be set toexisting
, and the corresponding--initial-cluster
option will be configured.
I am also very interested in the tidb-operator. So, I looked at the deployment code for the tidb-operator. I am curious about how the tidb-operator deploys a pd cluster.
func (pmm *pdMemberManager) getNewPDSetForTidbCluster(tc *v1alpha1.TidbCluster) (*apps.StatefulSet, error) {
…
vols := []corev1.Volume{
annVolume,
{Name: "config",
VolumeSource: corev1.VolumeSource{
ConfigMap: &corev1.ConfigMapVolumeSource{
LocalObjectReference: corev1.LocalObjectReference{
Name: pdConfigMap,
},
Items: []corev1.KeyToPath{{Key: "config-file", Path: "pd.toml"}},
},
},
},
{Name: "startup-script",
VolumeSource: corev1.VolumeSource{
ConfigMap: &corev1.ConfigMapVolumeSource{
LocalObjectReference: corev1.LocalObjectReference{
Name: pdConfigMap,
},
Items: []corev1.KeyToPath{{Key: "startup-script", Path: "pd_start_script.sh"}},
},
},
},
}
…
}
I noticed that when deploying a pd cluster, the tidb-operator saves the configuration items in "/etc/pd/config", including the —initial-cluster-state
option. FYI: https://docs-archive.pingcap.com/zh/tidb/v7.0/pd-configuration-file#initial-cluster-state
Therefore, I have the following questions:
- Does the tidb-operator also use the same deployment method as etcd-operator to deploy a cluster?
- If so, why is it designed this way? I see the following disadvantages with this method: a. This deployment method essentially expands from a single node to a specified number of nodes through membership change. However, in the early stages of deployment, when the number of nodes is less than three, consensus itself is not reliable. b. For starting multiple nodes, this deployment method will take more time because I need to wait for the membership change to complete.
- If the tidb-operator also uses this method to deploy, for example, a PD cluster, what are the design trade-offs?
Below is what I know:
Operator use startScript to start PD process, see code https://github.com/pingcap/tidb-operator/blob/v1.5.1/pkg/manager/member/pd_member_manager.go#L900
in the pdStartScriptTplText, every PD will connect to Discovery to compete leader (only in bootstrap stage), see Discovery code https://github.com/pingcap/tidb-operator/blob/v1.5.1/pkg/discovery/discovery.go
2.1 PD win leader campaign will use "--initial-cluster" to initialize PD cluster
2.2 PD lose leader campaign will use "--join" to join PD cluster
PD has internal ETCD to maintain leadership, there is a chinese blog picture in chinese community see https://tidb.net/blog/66b475c0
3.1 After bootstrap leadership is maintained in PD cluster itself