Installing Kubernetes on CentOS Atomic Host with kubeadm

Version 1.4 of Kubernetes, the open-source system for automating deployment, scaling, and management of containerized applications, included an awesome new tool for bootstrapping clusters: kubeadm.

Using kubeadm is as simple as installing the tool on a set of servers, running kubeadm init to initialize a master for the cluster, and running kubeadm join on some nodes to join them to the cluster. With kubeadm, the kubelet is installed as a regular software package, and the rest of the components run as docker containers.

The tool is available in packaged form for CentOS and for Ubuntu hosts, so I figured I’d avail myself of the package-layering capabilities of CentOS Atomic Host Continuous to install the kubeadm rpm on a few of these hosts to get up and running with an up-to-date and mostly-containerized kubernetes cluster.

However, I hit an issue trying to install one of the dependencies for kubeadm, kubernetes-cni:

# rpm-ostree pkg-add kubelet kubeadm kubectl kubernetes-cni
Checking out tree 060b08b... done

Downloading metadata: [=================================================] 100%
Resolving dependencies... done
Will download: 5 packages (41.1 MB)

Downloading from base: [==============================================] 100%

Downloading from kubernetes: [========================================] 100%

Importing: [============================================================] 100%
Overlaying... error: Unpacking kubernetes-cni-0.3.0.1-0.07a8a2.x86_64: openat: No such file or directory

It turns out that kubernetes-cni installs files to /opt, and rpm-ostree, the hybrid image/package system that underpins atomic hosts, doesn’t allow for this. I managed to work around the issue by rolling my own copy of the kubeadm packages that included a kubernetes-cni package that installed its binaries to /usr/lib/opt, but I found that not only do kubernetes’ network plugins expect to find the cni binaries in /opt, but they place their own binaries in there as needed, too. In an rpm-ostree system, /usr is read-only, so even if I modified the plugins to use /usr/lib/opt, they wouldn’t be able to write to that location.

I worked around this second issue by further modding my kubernetes-cni package to use tmpfiles.d, a service for managing temporary files and runtime directories for daemons, to create symlinks from each of the cni binaries stored in /usr/lib/opt/cni/bin to locations in in /opt/cni/bin. You can see the changes I made to the spec file here and find a package based on these changes in this copr.

I’m not positive that this is the best way to work around the problem, but it allowed me to get up and running with kubeadm on CentOS Atomic Host Continuous. Here’s how to do that:

This first step may or may not be crucial, I added it to my mix on the suggestion of this kubeadm doc page while I was puzzling over why the weave network plugin wasn’t working.

# cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

This set of steps adds my copr repo, overlays the needed packages, and kicks off a reboot for the overlay to take effect.

# cat <<EOF > /etc/yum.repos.d/jasonbrooks-kube-release-epel-7.repo
[jasonbrooks-kube-release]
name=Copr repo for kube-release owned by jasonbrooks
baseurl=https://copr-be.cloud.fedoraproject.org/results/jasonbrooks/kube-release/epel-7-x86_64/
type=rpm-md
skip_if_unavailable=True
gpgcheck=1
gpgkey=https://copr-be.cloud.fedoraproject.org/results/jasonbrooks/kube-release/pubkey.gpg
repo_gpgcheck=0
enabled=1
enabled_metadata=1
EOF

# rpm-ostree pkg-add --reboot kubelet kubeadm kubectl kubernetes-cni

These steps start the kubelet service, put selinux into permissive mode, which, according to this doc page should soon not be necessary, and initializes the cluster.

# systemctl enable kubelet.service --now

# setenforce 0

# kubeadm init --use-kubernetes-version "v1.4.3"

This step assigns the master node to also serve as a worker, and then deploys the weave network plugin on the cluster. To add additional workers, use the kubeadm join command provided when the cluster init operation completed.

# kubectl taint nodes --all dedicated-

# kubectl apply -f https://git.io/weave-kube

When the command kubectl get pods --all-namespaces shows that all of your pods are up and running, the cluster is ready for action.

The kubeadm tool is considered an “alpha” right now, but moving forward, this looks like it could be a great way to come up with an up-to-date kube cluster on atomic hosts. I’ll need to figure out whether my workaround to get kubernetes-cni working is sane enough before building a more official centos or fedora package for this, and I want to figure out how to swap out the project-built, debian-based kubernetes component containers with containers provided by centos or fedora, a topic I’ve written a bit about recently.

update: While the hyperkube containers that I’ve written about in the past were based on debian, the containers that kubeadm downloads appear to be built on busybox.

New CentOS Atomic Host with Package Layering Support

Last week, the CentOS Atomic SIG released an updated version of CentOS Atomic Host (tree version 7.20160818), featuring support for rpm-ostree package layering.

CentOS Atomic Host is available as a VirtualBox or libvirt-formatted Vagrant box, or as an installable ISO, qcow2 or Amazon Machine image. Check out the CentOS wiki for download links and installation instructions, or read on to learn more about what’s new in this release.

http://www.projectatomic.io/blog/2016/08/new-centos-atomic-host-with-package-layering-support/

running kubernetes in containers on atomic

The atomic hosts from CentOS and Fedora earn their “atomic” namesake by providing for atomic, image-based system updates via rpm-ostree, and atomic, image-based application updates via docker containers.

This “system” vs “application” division isn’t set in stone, however. There’s room for system components to move across from the somewhat rigid world of ostree commits to the freer-flowing container side.

In particular, the key atomic host components involved in orchestrating containers across multiple hosts, such as flannel, etcd and kubernetes, could run instead in containers, making life simpler for those looking to test out newer or different versions of these components, or to swap them out for alternatives.

Suraj Deshmukh wrote a post recently about running kubernetes in containers. He wanted to test kubernetes 1.3, for which Fedora packages aren’t yet available, so he turned to the upstream kubernetes-on-docker.

Suraj ran into trouble with flannel and etcd, so he ran those from installed rpms. Flannel can be tricky to run as a docker container, because docker’s own configs must be modified to use flannel, so there’s a bit of a chicken-and-egg situation.

One solution is system containers for atomic, which can be run independently from the docker daemon. Giuseppe Scrivano has built example containers for flannel and for etcd, and in this post, I’m describing how to use these system containers alongside a containerized kubernetes on an atomic host.

setting up flannel and etcd

You need a very recent version of the atomic command. I used a pair of CentOS Atomic Hosts running the “continuous” stream.

The master host needs etcd and flannel:

# atomic pull gscrivano/etcd

# atomic pull gscrivano/flannel

# atomic install --system gscrivano/etcd

With etcd running, we can use it to configure flannel:

# export MASTER_IP=YOUR-MASTER-IP

# runc exec gscrivano-etcd etcdctl set /atomic.io/network/config '{"Network":"172.17.0.0/16"}'

# atomic install --name=flannel --set FLANNELD_ETCD_ENDPOINTS=http://$MASTER_IP:2379 --system gscrivano/flannel

The worker node needs flannel as well:

# export MASTER_IP=YOUR-MASTER-IP

# atomic pull gscrivano/flannel

# atomic install --name=flannel --set ETCD_ENDPOINTS=http://$MASTER_IP:2379 --system gscrivano/flannel

On both the master and the worker, we need to make docker use flannel:

# echo "/usr/libexec/flannel/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker" | runc exec flannel bash

Also on both hosts, we need this docker tweak (because of this):

# cp /usr/lib/systemd/system/docker.service /etc/systemd/system/

# sed -i s/MountFlags=slave/MountFlags=/g /etc/systemd/system/docker.service

# systemctl daemon-reload

# systemctl restart docker

On both hosts, some context tweaks to make SELinux happy:

# mkdir -p /var/lib/kubelet/

# chcon -R -t svirt_sandbox_file_t /var/lib/kubelet/

# chcon -R -t svirt_sandbox_file_t /var/lib/docker/

setting up kube

With flannel and etcd running in system containers, and with docker configured properly, we can start up kubernetes in containers. I’ve pulled the following docker run commands from the docker-multinode scripts in the kubernetes project’s kube-deploy repository.

On the master:

# docker run -d \
--net=host \
--pid=host \
--privileged \
--restart="unless-stopped" \
--name kube_kubelet_$(date | md5sum | cut -c-5) \
-v /sys:/sys:rw \
-v /var/run:/var/run:rw \
-v /run:/run:rw \
-v /var/lib/docker:/var/lib/docker:rw \
-v /var/lib/kubelet:/var/lib/kubelet:shared \
-v /var/log/containers:/var/log/containers:rw \
gcr.io/google_containers/hyperkube-amd64:$(curl -sSL "https://storage.googleapis.com/kubernetes-release/release/stable.txt") \
/hyperkube kubelet \
--allow-privileged \
--api-servers=http://localhost:8080 \
--config=/etc/kubernetes/manifests-multi \
--cluster-dns=10.0.0.10 \
--cluster-domain=cluster.local \
--hostname-override=${MASTER_IP} \
--v=2

On the worker:

# export WORKER_IP=YOUR-WORKER-IP

# docker run -d \
--net=host \
--pid=host \
--privileged \
--restart="unless-stopped" \
--name kube_kubelet_$(date | md5sum | cut -c-5) \
-v /sys:/sys:rw \
-v /var/run:/var/run:rw \
-v /run:/run:rw \
-v /var/lib/docker:/var/lib/docker:rw \
-v /var/lib/kubelet:/var/lib/kubelet:shared \
-v /var/log/containers:/var/log/containers:rw \
gcr.io/google_containers/hyperkube-amd64:$(curl -sSL "https://storage.googleapis.com/kubernetes-release/release/stable.txt") \
/hyperkube kubelet \
--allow-privileged \
--api-servers=http://${MASTER_IP}:8080 \
--cluster-dns=10.0.0.10 \
--cluster-domain=cluster.local \
--hostname-override=${WORKER_IP} \
--v=2

# docker run -d \
--net=host \
--privileged \
--name kube_proxy_$(date | md5sum | cut -c-5) \
--restart="unless-stopped" \
gcr.io/google_containers/hyperkube-amd64:$(curl -sSL "https://storage.googleapis.com/kubernetes-release/release/stable.txt") \
/hyperkube proxy \
--master=http://${MASTER_IP}:8080 \
--v=2

get current kubectl

I usually test things out from the master node, so I’ll download the newest stable kubectl binary to there:

# curl -sSL https://storage.googleapis.com/kubernetes-release/release/$(curl -sSL "https://storage.googleapis.com/kubernetes-release/release/stable.txt")/bin/linux/amd64/kubectl > /usr/local/bin/kubectl

# chmod +x /usr/local/bin/kubectl

test it

It takes a few minutes for all the containers to get up and running. Once they are, you can start running kubernetes apps. I typically test with the guestbookgo atomicapp:

# atomic run projectatomic/guestbookgo-atomicapp

Wait a few minutes, until kubectl get pods tells you that your guestbook and redis pods are running, and then:

# kubectl describe service guestbook | grep NodePort

Visiting the NodePort returned above at either my master or worker IP (these kube scripts configure both to serve as workers) gives me this:

Too Fast, Too Slow

Yesterday I removed Fedora 17 from the server I use for oVirt testing, mainly, because I’ve been experiencing random reboots on the server, and I haven’t been able to figure out why. I’m pretty sure I wasn’t having these issues on Fedora 16, but I can’t go back to that release because the official packages for oVirt are built only for F17. There are, however, oVirt packages built for Enterprise Linux (aka RHEL and its children), and I know that some in the oVirt community have been running with these packages with success.

So, I figured I’d install CentOS 6 on my machine and either escape my random reboots or, if the reboots continued, learn that there’s probably something wrong with my hardware. Plus, I’d escape a second bug I’ve been experiencing with Fedora 17, the one in which a recent rebase to the Linux 3.5 kernel (F17 shipped originally with a 3.3 kernel) seems to have broken oVirt’s ability to access NFS shares, thereby breaking oVirt.

Installing oVirt 3.1 on CentOS 6 went very smoothly — the steps involved were pretty much the same as those for Fedora. Before I knew it, I was back up and running with a CentOS-based oVirt 3.1 rig just like my F17 one, complete with my F17 test server template and my F17 VMs for my in-progress gluster/ovirt integration writeup, all repatriated from my oVirt export domain.

However… all is not well.

My Fedora 17 VMs aren’t running normally on my new CentOS 6 host, and what I’m seeing reminds me of a bug I encountered several weeks ago when I first upgraded from oVirt 3.0 to the oVirt 3.1 beta. The solution came in the form of a bugfix from the qemu project upstream — that’s a real benefit of running a leading edge distro like Fedora — when issues are fixed upstream, you don’t have to wait forever for them to float along to you.

Also, the closer you are to upstream, the faster you get access to new features. Not long after updating my qemu to address the F16/F17 VM booting issue, I took to running qemu packages even closer to upstream, from the Fedora Virtualization Preview repository. The oVirt 3.1 management engine supports live snapshots, but requires at least qemu 1.1, which is slated for Fedora 18.

Of course, the downside of tracking the leading edge is that with frequent changes come frequent opportunities for breakage. The changes that don’t directly address your pain points are pure downside, like the NFS-disabling kernel rebase I mentioned earlier. Too fast versus too slow.

So what now?

I wasn’t experiencing these random reboots on my other F17 system — my Thinkpad X220, which I’ve pressed into service as a second oVirt node. I have this F17-based node hooked up to my el6 oVirt engine, and if I set my Fedora VMs to launch only on this node, they run just fine. This machine has only 8GB of RAM, though, and that limits how many VMs I can run on it. Also, since my F17 and el6 nodes are running different versions of qemu, live migration between them doesn’t work.

  • I could shift my in-progress ovirt/gluster testing to el6, VMs of which run just fine with the older qemu, but I’d prefer to keep testing with Fedora, and the newest code.
  • I could, instead of hitting the brakes and running el6 on my test server, hit the gas and throw F18 on there. Maybe that’d solve my random reboot issue, though I’m not sure if my disabled NFS travails would follow me forward.
  • I could figure out how to rebuild the new qemu packages on el6. I’ve started down this path already, but rpmbuild is voicing some complaints that seem related to systemd, which F17 uses and el6 does not.
  • I could find out that my random reboot problems weren’t the fault of F17 after all, which would send me poring over my hardware and possibly returning to F17.

For now, I’m going to play some more with updating my qemu on el6, while squeezing my F17 VMs into my smaller F17-based node to get this ovirt/gluster howto finished.

Then maybe I’ll take a long walk on the beach and meditate on the merits of too slow versus too fast in Linux distros, and ponder whether the Giants will sweep the Astros tonight.

Update: I was able to rebuild Fedora qemu 1.1 packages for CentOS. I commented out some systemd-dependent stuff from the spec file. I had to rebuild a couple of other packages, too, which I found in Fedora’s buildsystem. Now, my Fedora 17 VMs run well on my CentOS 6 oVirt host (which hasn’t randomly rebooted yet), and I can migrate VMs between it and my F17-based node.

And the Giants won, too.