Kubernetes v1.10 [stable]
kubeadm init and kubeadm join together provide a nice user experience for creating a
bare Kubernetes cluster from scratch, that aligns with the best-practices.
However, it might not be obvious how kubeadm does that.
This document provides additional details on what happens under the hood, with the aim of sharing knowledge on the best practices for a Kubernetes cluster.
The cluster that kubeadm init and kubeadm join set up should be:
kubeadm initexport KUBECONFIG=/etc/kubernetes/admin.confkubectl apply -f <network-plugin-of-choice.yaml>kubeadm join --token <token> <endpoint>:<port>In order to reduce complexity and to simplify development of higher level tools that build on top of kubeadm, it uses a limited set of constant values for well-known paths and file names.
The Kubernetes directory /etc/kubernetes is a constant in the application, since it is clearly the given path
in a majority of cases, and the most intuitive location; other constant paths and file names are:
/etc/kubernetes/manifests as the path where the kubelet should look for static Pod manifests.
Names of static Pod manifests are:
etcd.yamlkube-apiserver.yamlkube-controller-manager.yamlkube-scheduler.yaml/etc/kubernetes/ as the path where kubeconfig files with identities for control plane
components are stored. Names of kubeconfig files are:
kubelet.conf (bootstrap-kubelet.conf during TLS bootstrap)controller-manager.confscheduler.confadmin.conf for the cluster admin and kubeadm itselfsuper-admin.conf for the cluster super-admin that can bypass RBACNames of certificates and key files:
ca.crt, ca.key for the Kubernetes certificate authorityapiserver.crt, apiserver.key for the API server certificateapiserver-kubelet-client.crt, apiserver-kubelet-client.key for the client certificate used
by the API server to connect to the kubelets securelysa.pub, sa.key for the key used by the controller manager when signing ServiceAccountfront-proxy-ca.crt, front-proxy-ca.key for the front proxy certificate authorityfront-proxy-client.crt, front-proxy-client.key for the front proxy clientMost kubeadm commands support a --config flag which allows passing a configuration file from
disk. The configuration file format follows the common Kubernetes API apiVersion / kind scheme,
but is considered a component configuration format. Several Kubernetes components, such as the kubelet,
also support file-based configuration.
Different kubeadm subcommands require a different kind of configuration file.
For example, InitConfiguration for kubeadm init, JoinConfiguration for kubeadm join, UpgradeConfiguration for kubeadm upgrade and ResetConfiguration
for kubeadm reset.
The command kubeadm config migrate can be used to migrate an older format configuration
file to a newer (current) configuration format. The kubeadm tool only supports migrating from
deprecated configuration formats to the current format.
See the kubeadm configuration reference page for more details.
The kubeadm init consists of a sequence of atomic work tasks to perform,
as described in the kubeadm init internal workflow.
The kubeadm init phase command allows
users to invoke each task individually, and ultimately offers a reusable and composable
API/toolbox that can be used by other Kubernetes bootstrap tools, by any IT automation tool or by
an advanced user for creating custom clusters.
Kubeadm executes a set of preflight checks before starting the init, with the aim to verify
preconditions and avoid common cluster startup problems.
The user can skip specific preflight checks or all of them with the --ignore-preflight-errors option.
--kubernetes-version flag) is
at least one minor version higher than the kubeadm CLI version./etc/kubernetes/manifest folder already exists and it is not emptyip, iptables, mount, nsenter commands are not present in the command pathethtool, tc, touch commands are not present in the command pathkubeadm init phase preflight
command.Kubeadm generates certificate and private key pairs for different purposes:
A self signed certificate authority for the Kubernetes cluster saved into ca.crt file and
ca.key private key file
A serving certificate for the API server, generated using ca.crt as the CA, and saved into
apiserver.crt file with its private key apiserver.key. This certificate should contain
the following alternative names:
10.96.0.1 if service subnet is 10.96.0.0/12)kubernetes.default.svc.cluster.local if --service-dns-domain
flag value is cluster.local, plus default DNS names kubernetes.default.svc,
kubernetes.default, kubernetes--apiserver-advertise-addressA client certificate for the API server to connect to the kubelets securely, generated using
ca.crt as the CA and saved into apiserver-kubelet-client.crt file with its private key
apiserver-kubelet-client.key.
This certificate should be in the system:masters organization
A private key for signing ServiceAccount Tokens saved into sa.key file along with its public key sa.pub
A certificate authority for the front proxy saved into front-proxy-ca.crt file with its key
front-proxy-ca.key
A client certificate for the front proxy client, generated using front-proxy-ca.crt as the CA and
saved into front-proxy-client.crt file with its private keyfront-proxy-client.key
Certificates are stored by default in /etc/kubernetes/pki, but this directory is configurable
using the --cert-dir flag.
Please note that:
/etc/kubernetes/pki/ca.{crt,key}, and then kubeadm will use those files for signing the rest of the certs.
See also using custom certificatesca.crt file but not the ca.key file. If all other certificates and kubeconfig files
are already in place, kubeadm recognizes this condition and activates the ExternalCA, which also implies the csrsigner controller in
controller-manager won't be started--dry-run mode, certificate files are written in a temporary folderkubeadm init phase certs all commandKubeadm generates kubeconfig files with identities for control plane components:
A kubeconfig file for the kubelet to use during TLS bootstrap -
/etc/kubernetes/bootstrap-kubelet.conf. Inside this file, there is a bootstrap-token or embedded
client certificates for authenticating this node with the cluster.
This client certificate should:
system:nodes organization, as required by the
Node Authorization modulesystem:node:<hostname-lowercased>A kubeconfig file for controller-manager, /etc/kubernetes/controller-manager.conf; inside this
file is embedded a client certificate with controller-manager identity. This client certificate should
have the CN system:kube-controller-manager, as defined by default
RBAC core components roles
A kubeconfig file for scheduler, /etc/kubernetes/scheduler.conf; inside this file is embedded
a client certificate with scheduler identity.
This client certificate should have the CN system:kube-scheduler, as defined by default
RBAC core components roles
Additionally, a kubeconfig file for kubeadm as an administrative entity is generated and stored
in /etc/kubernetes/admin.conf. This file includes a certificate with
Subject: O = kubeadm:cluster-admins, CN = kubernetes-admin. kubeadm:cluster-admins
is a group managed by kubeadm. It is bound to the cluster-admin ClusterRole during kubeadm init,
by using the super-admin.conf file, which does not require RBAC.
This admin.conf file must remain on control plane nodes and should not be shared with additional users.
During kubeadm init another kubeconfig file is generated and stored in /etc/kubernetes/super-admin.conf.
This file includes a certificate with Subject: O = system:masters, CN = kubernetes-super-admin.
system:masters is a superuser group that bypasses RBAC and makes super-admin.conf useful in case
of an emergency where a cluster is locked due to RBAC misconfiguration.
The super-admin.conf file must be stored in a safe location and should not be shared with additional users.
See RBAC user facing role bindings for additional information on RBAC and built-in ClusterRoles and groups.
You can run kubeadm kubeconfig user
to generate kubeconfig files for additional users.
Also note that:
ca.crt certificate is embedded in all the kubeconfig files.--dry-run mode, kubeconfig files are written in a temporary folderkubeadm init phase kubeconfig all commandKubeadm writes static Pod manifest files for control plane components to
/etc/kubernetes/manifests. The kubelet watches this directory for Pods to be created on startup.
Static Pod manifests share a set of common properties:
All static Pods are deployed on kube-system namespace
All static Pods get tier:control-plane and component:{component-name} labels
All static Pods use the system-node-critical priority class
hostNetwork: true is set on all static Pods to allow control plane startup before a network is
configured; as a consequence:
address that the controller-manager and the scheduler use to refer to the API server is 127.0.0.1etcd-server address will be set to 127.0.0.1:2379Leader election is enabled for both the controller-manager and the scheduler
Controller-manager and the scheduler will reference kubeconfig files with their respective, unique identities
All static Pods get any extra flags or patches that you specify, as described in passing custom arguments to control plane components
All static Pods get any extra Volumes specified by the user (Host path)
Please note that:
--dry-run mode, static Pod files are written in a
temporary folderkubeadm init phase control-plane all commandThe static Pod manifest for the API server is affected by the following parameters provided by the users:
apiserver-advertise-address and apiserver-bind-port to bind to; if not provided, those
values default to the IP address of the default network interface on the machine and port 6443service-cluster-ip-range to use for servicesetcd-servers address and related TLS settings
(etcd-cafile, etcd-certfile, etcd-keyfile);
if an external etcd server is not provided, a local etcd will be used (via host network)--cloud-provider parameter is configured together
with the --cloud-config path if such file exists (this is experimental, alpha and will be
removed in a future version)Other API server flags that are set unconditionally are:
--insecure-port=0 to avoid insecure connections to the api server
--enable-bootstrap-token-auth=true to enable the BootstrapTokenAuthenticator authentication module.
See TLS Bootstrapping for more details
--allow-privileged to true (required e.g. by kube proxy)
--requestheader-client-ca-file to front-proxy-ca.crt
--enable-admission-plugins to:
NamespaceLifecycle
e.g. to avoid deletion of system reserved namespacesLimitRanger
and ResourceQuota
to enforce limits on namespacesServiceAccount
to enforce service account automationPersistentVolumeLabel
attaches region or zone labels to PersistentVolumes as defined by the cloud provider (This
admission controller is deprecated and will be removed in a future version.
It is not deployed by kubeadm by default with v1.9 onwards when not explicitly opting into
using gce or aws as cloud providers)DefaultStorageClass
to enforce default storage class on PersistentVolumeClaim objectsDefaultTolerationSecondsNodeRestriction
to limit what a kubelet can modify (e.g. only pods on this node)--kubelet-preferred-address-types to InternalIP,ExternalIP,Hostname; this makes kubectl logs and other API server-kubelet communication work in environments where the hostnames of the
nodes aren't resolvable
Flags for using certificates generated in previous steps:
--client-ca-file to ca.crt--tls-cert-file to apiserver.crt--tls-private-key-file to apiserver.key--kubelet-client-certificate to apiserver-kubelet-client.crt--kubelet-client-key to apiserver-kubelet-client.key--service-account-key-file to sa.pub--requestheader-client-ca-file to front-proxy-ca.crt--proxy-client-cert-file to front-proxy-client.crt--proxy-client-key-file to front-proxy-client.keyOther flags for securing the front proxy (API Aggregation) communications:
--requestheader-username-headers=X-Remote-User--requestheader-group-headers=X-Remote-Group--requestheader-extra-headers-prefix=X-Remote-Extra---requestheader-allowed-names=front-proxy-clientThe static Pod manifest for the controller manager is affected by following parameters provided by the users:
If kubeadm is invoked specifying a --pod-network-cidr, the subnet manager feature required for
some CNI network plugins is enabled by setting:
--allocate-node-cidrs=true--cluster-cidr and --node-cidr-mask-size flags according to the given CIDROther flags that are set unconditionally are:
--controllers enabling all the default controllers plus BootstrapSigner and TokenCleaner
controllers for TLS bootstrap. See TLS Bootstrapping
for more details.
--use-service-account-credentials to true
Flags for using certificates generated in previous steps:
--root-ca-file to ca.crt--cluster-signing-cert-file to ca.crt, if External CA mode is disabled, otherwise to ""--cluster-signing-key-file to ca.key, if External CA mode is disabled, otherwise to ""--service-account-private-key-file to sa.keyThe static Pod manifest for the scheduler is not affected by parameters provided by the user.
If you specified an external etcd, this step will be skipped, otherwise kubeadm generates a static Pod manifest file for creating a local etcd instance running in a Pod with following attributes:
localhost:2379 and use HostNetwork=truehostPath mount out from the dataDir to the host's filesystemPlease note that:
registry.gcr.io by default. See
using custom images
for customizing the image repository.--dry-run mode, the etcd static Pod manifest is written
into a temporary folder.kubeadm init phase etcd local
command.On control plane nodes, kubeadm waits up to 4 minutes for the control plane components
and the kubelet to be available. It does that by performing a health check on the respective
component /healthz or /livez endpoints.
After the control plane is up, kubeadm completes the tasks described in following paragraphs.
kubeadm saves the configuration passed to kubeadm init in a ConfigMap named kubeadm-config
under kube-system namespace.
This will ensure that kubeadm actions executed in future (e.g kubeadm upgrade) will be able to
determine the actual/current cluster state and make new decisions based on that data.
Please note that:
kubeadm init phase upload-config.As soon as the control plane is available, kubeadm executes the following actions:
node-role.kubernetes.io/control-plane=""node-role.kubernetes.io/control-plane:NoSchedulePlease note that the phase to mark the control-plane phase can be invoked
individually with the kubeadm init phase mark-control-plane command.
Kubeadm uses Authenticating with Bootstrap Tokens for joining new nodes to an existing cluster; for more details see also design proposal.
kubeadm init ensures that everything is properly configured for this process, and this includes
following steps as well as setting API server and controller flags as already described in
previous paragraphs.
kubeadm init phase bootstrap-token,
executing all the configuration steps described in following paragraphs;
alternatively, each step can be invoked individually.kubeadm init creates a first bootstrap token, either generated automatically or provided by the
user with the --token flag; as documented in bootstrap token specification, token should be
saved as a secret with name bootstrap-token-<token-id> under kube-system namespace.
Please note that:
kubeadm init will be used to validate temporary user during TLS
bootstrap process; those users will be member of
system:bootstrappers:kubeadm:default-node-token group—token-ttl flag)kubeadm token
command, that provide other useful functions for token management as well.Kubeadm ensures that users in system:bootstrappers:kubeadm:default-node-token group are able to
access the certificate signing API.
This is implemented by creating a ClusterRoleBinding named kubeadm:kubelet-bootstrap between the
group above and the default RBAC role system:node-bootstrapper.
Kubeadm ensures that the Bootstrap Token will get its CSR request automatically approved by the csrapprover controller.
This is implemented by creating ClusterRoleBinding named kubeadm:node-autoapprove-bootstrap
between the system:bootstrappers:kubeadm:default-node-token group and the default role
system:certificates.k8s.io:certificatesigningrequests:nodeclient.
The role system:certificates.k8s.io:certificatesigningrequests:nodeclient should be created as
well, granting POST permission to
/apis/certificates.k8s.io/certificatesigningrequests/nodeclient.
Kubeadm ensures that certificate rotation is enabled for nodes, and that a new certificate request for nodes will get its CSR request automatically approved by the csrapprover controller.
This is implemented by creating ClusterRoleBinding named
kubeadm:node-autoapprove-certificate-rotation between the system:nodes group and the default
role system:certificates.k8s.io:certificatesigningrequests:selfnodeclient.
This phase creates the cluster-info ConfigMap in the kube-public namespace.
Additionally, it creates a Role and a RoleBinding granting access to the ConfigMap for
unauthenticated users (i.e. users in RBAC group system:unauthenticated).
cluster-info ConfigMap is not rate-limited. This may or may not be a
problem if you expose your cluster's API server to the internet; worst-case scenario here is a
DoS attack where an attacker uses all the in-flight requests the kube-apiserver can handle to
serve the cluster-info ConfigMap.Kubeadm installs the internal DNS server and the kube-proxy addon components via the API server.
kubeadm init phase addon all.A ServiceAccount for kube-proxy is created in the kube-system namespace; then kube-proxy is
deployed as a DaemonSet:
ca.crt and token) to the control plane come from the ServiceAccountkube-proxy ServiceAccount is bound to the privileges in the system:node-proxier ClusterRoleThe CoreDNS service is named kube-dns for compatibility reasons with the legacy kube-dns
addon.
A ServiceAccount for CoreDNS is created in the kube-system namespace.
The coredns ServiceAccount is bound to the privileges in the system:coredns ClusterRole
In Kubernetes version 1.21, support for using kube-dns with kubeadm was removed.
You can use CoreDNS with kubeadm even when the related Service is named kube-dns.
Similarly to kubeadm init, also kubeadm join internal workflow consists of a sequence of
atomic work tasks to perform.
This is split into discovery (having the Node trust the Kubernetes API Server) and TLS bootstrap (having the Kubernetes API Server trust the Node).
see Authenticating with Bootstrap Tokens or the corresponding design proposal.
kubeadm executes a set of preflight checks before starting the join, with the aim to verify
preconditions and avoid common cluster startup problems.
Also note that:
kubeadm join preflight checks are basically a subset of kubeadm init preflight checks--ignore-preflight-errors option.There are 2 main schemes for discovery. The first is to use a shared token along with the IP address of the API server. The second is to provide a file (that is a subset of the standard kubeconfig file).
If kubeadm join is invoked with --discovery-token, token discovery is used; in this case the
node basically retrieves the cluster CA certificates from the cluster-info ConfigMap in the
kube-public namespace.
In order to prevent "man in the middle" attacks, several steps are taken:
First, the CA certificate is retrieved via insecure connection (this is possible because
kubeadm init is granted access to cluster-info users for system:unauthenticated)
Then the CA certificate goes through following validation steps:
--discovery-token-ca-cert-hash. This value is available
in the output of kubeadm init or can be calculated using standard tools (the hash is
calculated over the bytes of the Subject Public Key Info (SPKI) object as in RFC7469). The
--discovery-token-ca-cert-hash flag may be repeated multiple times to allow more than one public key.--discovery-token-unsafe-skip-ca-verification flag on the command line.
This weakens the kubeadm security model since others can potentially impersonate the Kubernetes API server.If kubeadm join is invoked with --discovery-file, file discovery is used; this file can be a
local file or downloaded via an HTTPS URL; in case of HTTPS, the host installed CA bundle is used
to verify the connection.
With file discovery, the cluster CA certificate is provided into the file itself; in fact, the
discovery file is a kubeconfig file with only server and certificate-authority-data attributes
set, as described in the kubeadm join
reference doc; when the connection with the cluster is established, kubeadm tries to access the
cluster-info ConfigMap, and if available, uses it.
Once the cluster info is known, the file bootstrap-kubelet.conf is written, thus allowing
kubelet to do TLS Bootstrapping.
The TLS bootstrap mechanism uses the shared token to temporarily authenticate with the Kubernetes API server to submit a certificate signing request (CSR) for a locally created key pair.
The request is then automatically approved and the operation completes saving ca.crt file and
kubelet.conf file to be used by the kubelet for joining the cluster, while bootstrap-kubelet.conf
is deleted.
kubeadm init
process (or with additional tokens created with kubeadm token command)system:bootstrappers:kubeadm:default-node-token group which was granted access to the CSR api
during the kubeadm init processkubeadm init processkubeadm upgrade has sub-commands for handling the upgrade of the Kubernetes cluster created by kubeadm.
You must run kubeadm upgrade apply on a control plane node (you can choose which one);
this starts the upgrade process. You then run kubeadm upgrade node on all remaining
nodes (both worker nodes and control plane nodes).
Both kubeadm upgrade apply and kubeadm upgrade node have a phase subcommand which provides access
to the internal phases of the upgrade process.
See kubeadm upgrade phase for more details.
Additional utility upgrade commands are kubeadm upgrade plan and kubeadm upgrade diff.
All upgrade sub-commands support passing a configuration file.
You can optionally run kubeadm upgrade plan before you run kubeadm upgrade apply.
The plan subcommand checks which versions are available to upgrade
to and validates whether your current cluster is upgradeable.
This shows what differences would be applied to existing static pod manifests for control plane nodes.
A more verbose way to do the same thing is running kubeadm upgrade apply --dry-run or
kubeadm upgrade node --dry-run.
kubeadm upgrade apply prepares the cluster for the upgrade of all nodes, and also
upgrades the control plane node where it's run. The steps it performs are:
kubeadm init and kubeadm join, ensuring container images are downloaded
and the cluster is in a good state to be upgraded./etc/kubernetes/manifests and waits
for the kubelet to restart the components if the files have changed.kubeadm-config
and the kubelet-config ConfigMaps (both in the kube-system namespace)./var/lib/kubelet/config.yaml,
and read the node's /var/lib/kubelet/instance-config.yaml file
and patch fields like containerRuntimeEndpoint
from this instance configuration into /var/lib/kubelet/config.yaml.cluster-info ConfigMap for RBAC rules. This is the same as
in the kubeadm init stage and ensures that the cluster continues to support nodes joining with bootstrap tokens.kubeadm upgrade node upgrades a single control plane or worker node after the cluster upgrade has
started (by running kubeadm upgrade apply). The command detects if the node is a control plane node by checking
if the file /etc/kubernetes/manifests/kube-apiserver.yaml exists. On finding that file, the kubeadm tool
infers that there is a running kube-apiserver Pod on this node.
kubeadm upgrade apply./etc/kubernetes/manifests
and waits for the kubelet to restart the components if the files have changed./var/lib/kubelet/config.yaml,
and read the node's /var/lib/kubelet/instance-config.yaml file and
patch fields like containerRuntimeEndpoint
from this instance configuration into /var/lib/kubelet/config.yaml.You can use the kubeadm reset subcommand on a node where kubeadm commands previously executed.
This subcommand performs a best-effort cleanup of the node.
If certain actions fail you must intervene and perform manual cleanup.
The command supports phases.
See kubeadm reset phase for more details.
The command supports a configuration file.
Additionally:
.kube/ in the user's home directory is not cleaned up.The command has the following stages:
/var/lib/kubelet./var/lib/kubelet and /etc/kubernetes.