Trying Out a New Path to Kubernetes: Kubespray

I just came across this Little Guide to Kubernetes Install Options, which covers a few options I’ve heard of, and a few options I haven’t heard of. It doesn’t mention the main way that I deploy Kubernetes, which is through the Ansible scripts from the kubernetes/contrib repository. The post does point to another Ansible-based option, though, and I wondered whether this one, called Kubespray (nee Kargo) would work with Atomic Hosts.

I installed kubespray:

$ sudo pip2 install kubespray

I generated an inventory for a baremetal (actually VMs) cluster with one etcd host / kube master and two nodes:

$ kubespray prepare --nodes node2[ansible_ssh_host=cah-2.osas.lab] node3[ansible_ssh_host=cah-3.osas.lab] --etcds node1[ansible_ssh_host=cah-1.osas.lab] --masters node1[ansible_ssh_host=cah-1.osas.lab]

I deployed the cluster, providing the argument -u root because my ansible host was already set up to access my test VMs as root via ssh key:

$ kubespray deploy -u root

The ansible zoomed by, eventually ending with:

PLAY RECAP *********************************************************************
localhost : ok=3 changed=1 unreachable=0 failed=0
node1 : ok=393 changed=95 unreachable=0 failed=0
node2 : ok=333 changed=76 unreachable=0 failed=0
node3 : ok=303 changed=65 unreachable=0 failed=0

Kubernetes deployed successfuly

I tested the cluster by deploying the guestbook go sample app, as is my custom, and sure enough, everything seemed to be working.

The biggest difference between this installation route and the one I usually take is the source of the containers. Where I typically run CentOS Atomic with Kubernetes rpms from the CentOS project or with containers based on those rpms, and the same with Fedora Atomic and Fedora-based content, the Kubespray installer set me up with container images mostly from CoreOS:

[root@cah-1 ~]# atomic containers list
CONTAINER ID IMAGE COMMAND CREATED STATE BACKEND RUNTIME
19d6514ceb1a quay.io/coreos/hyper /hyperkube controlle 2017-08-11 18:54 running docker docker
47bb6f63af38 gcr.io/google_contai /pause 2017-08-11 18:54 running docker docker
2102af0a5915 quay.io/coreos/hyper /hyperkube scheduler 2017-08-11 18:54 running docker docker
8af0c87bcfbd gcr.io/google_contai /pause 2017-08-11 18:54 running docker docker
c91bf4d9c687 quay.io/coreos/hyper /hyperkube apiserver 2017-08-11 18:54 running docker docker
96bc198022ac gcr.io/google_contai /pause 2017-08-11 18:54 running docker docker
e5cedfe5145e calico/node:v1.1.3 start_runit 2017-08-11 18:53 running docker docker
a31b6a04be23 quay.io/coreos/hyper /hyperkube proxy --v 2017-08-11 18:52 running docker docker
877aa10ab6a4 gcr.io/google_contai /pause 2017-08-11 18:52 running docker docker
b9f64835b7e5 quay.io/coreos/hyper ./hyperkube kubelet 2017-08-11 18:52 running docker docker
1bab52292b2d quay.io/coreos/etcd: /usr/local/bin/etcd 2017-08-11 18:48 running docker docker

It’s not a big deal swapping out one container source for another, however. Fedora and CentOS aren’t providing a hyperkube container, which is what kubespray (and kubeadm, for that matter) look to use, but we could create one for Fedora and CentOS based on the upstream Dockerfile.

getting stuff done with a local openshift origin instance

A few of the projects I work with use static websites based on middleman, which you can run locally to see how your edits, or those of others, will look on the live site when they’re merged.

Each of these sites defaults to port 4567 when running locally, so if I’m running more than one of them at a time, they complain that their favored port is already taken. It’s easy enough to fire up middleman on a different port, but I thought I’d try and run a couple of these in containers, using a local instance of OpenShift Origin, a Kubernetes-based container application platform.

It’s pretty easy to get up and running with an OpenShift Origin instance using the command oc cluster up. The oc client is available for Linux, Windows and Mac OS. Since containers (pretty much) are Linux, you’ll need a Linux VM on Mac or Windows, but the oc client can use docker machine to take care of that for you. I haven’t tested that, though, because I use Linux already.

On Fedora, I followed these instructions, with the exception of installing the oc client from the Fedora repos (dnf install -y origin-clients), rather than downloading the binary from GitHub.

I wanted my origin install to persist across restarts, so I created a folder in my home directory to store persistent data, and started up my instance with:

$ sudo oc cluster up --host-data-dir=/home/jbrooks/origin-data --use-existing-config

sudo was necessary because I haven’t set up my regular user account to run docker without it — not a big deal, but some config files for logging in to my origin instance as admin ended up in my /root directory instead of my home directory, so I copied those over:

$ sudo cp -r /root/.kube ~/.
$ sudo chown -R jbrooks:jbrooks ~/.kube

I logged into the OpenShift web console using the URL and the developer:developer user name and password output by the oc cluster up command, clicked “Add to Project”, and then, under the “Languages” heading, chose “Ruby,” and then “Ruby 2.3”, because middleman is a ruby affair.

I filled in a name, pasted in the git repository URL for the ovirt middleman site, and hit “Create.”

I headed to the “Overview” page, saw that my build was running, clicked “View Log,” and saw that a familiar-looking build process was chugging along.

When the build finished, OpenShift kicked off a deployment of my image, which I could see from the deployment log linked from the overview page, was erroring out.

After some poking around, I fixed the issue by heading to the deployments section of the web console and, after first pausing the deployment, hitting the edit YAML button. I used the YAML editor to add a command right in between the image and ports sections of the configuration.

I also changed the containerPort from a default of 8080 to the middleman default of 4567. I expected this change to filter down to the service and route that were automatically created for me, but they didn’t — it wasn’t tough to edit those via the web console, however.

I added GIT_COMMITTER_NAME and GIT_COMMITTER_EMAIL environment variables to my deployment, from an “Environment” tab in the deployments area of the console. As I eventually learned, git got grumpy about running as a random UID (as is OpenShift’s security-conscious custom) rather than as a “real” user with an entry in /etc/passwd, but adding those ENV variables calmed git down.

Once I had a pod up and running, I was able to view the development site in my web browser via the URL provided in the routes section of the console.

Next, I headed to my terminal to log into my running pod with OpenShift’s oc rsh command, and fetch and check out a pending pull request on the ovirt site:

$ oc rsh ovirt-site-2-4-50eao

$ git fetch origin pull/877/head:pr-ovirt-gluster-411

$ git checkout pr-ovirt-gluster-411

The middleman development server handles live reloading, so once I checked out the new branch, it refreshed, and I could see my awaiting-merge blog post:

This works, but I’ll probably hone the process some more from here. I experimented a bit with using kompose to put together a simple docker compose-formatted manifest for my app that could either pull from an openshift-built or a built-elsewhere docker container. Like this:

version: "2"

services:  
  ovirt-site:
    image: 172.30.24.24:5000/myproject/ovirt-site
    ports:
      - "4567"
    environment:
      - GIT_COMMITTER_NAME="Jason Brooks"
      - GIT_COMMITTER_EMAIL="jbrooks@redhat.com"
    entrypoint:
      - scl
      - enable
      - rh-ruby23
      - /opt/app-root/src/run-server.sh
    labels:
      kompose.service.type: NodePort

I think that that approach would then work for a regular kube cluster or, with some tweaking, probably, docker or docker swarm as well.

Installing Kubernetes on CentOS Atomic Host with kubeadm

Version 1.4 of Kubernetes, the open-source system for automating deployment, scaling, and management of containerized applications, included an awesome new tool for bootstrapping clusters: kubeadm.

Using kubeadm is as simple as installing the tool on a set of servers, running kubeadm init to initialize a master for the cluster, and running kubeadm join on some nodes to join them to the cluster. With kubeadm, the kubelet is installed as a regular software package, and the rest of the components run as docker containers.

The tool is available in packaged form for CentOS and for Ubuntu hosts, so I figured I’d avail myself of the package-layering capabilities of CentOS Atomic Host Continuous to install the kubeadm rpm on a few of these hosts to get up and running with an up-to-date and mostly-containerized kubernetes cluster.

However, I hit an issue trying to install one of the dependencies for kubeadm, kubernetes-cni:

# rpm-ostree pkg-add kubelet kubeadm kubectl kubernetes-cni
Checking out tree 060b08b... done

Downloading metadata: [=================================================] 100%
Resolving dependencies... done
Will download: 5 packages (41.1 MB)

Downloading from base: [==============================================] 100%

Downloading from kubernetes: [========================================] 100%

Importing: [============================================================] 100%
Overlaying... error: Unpacking kubernetes-cni-0.3.0.1-0.07a8a2.x86_64: openat: No such file or directory

It turns out that kubernetes-cni installs files to /opt, and rpm-ostree, the hybrid image/package system that underpins atomic hosts, doesn’t allow for this. I managed to work around the issue by rolling my own copy of the kubeadm packages that included a kubernetes-cni package that installed its binaries to /usr/lib/opt, but I found that not only do kubernetes’ network plugins expect to find the cni binaries in /opt, but they place their own binaries in there as needed, too. In an rpm-ostree system, /usr is read-only, so even if I modified the plugins to use /usr/lib/opt, they wouldn’t be able to write to that location.

I worked around this second issue by further modding my kubernetes-cni package to use tmpfiles.d, a service for managing temporary files and runtime directories for daemons, to create symlinks from each of the cni binaries stored in /usr/lib/opt/cni/bin to locations in in /opt/cni/bin. You can see the changes I made to the spec file here and find a package based on these changes in this copr.

I’m not positive that this is the best way to work around the problem, but it allowed me to get up and running with kubeadm on CentOS Atomic Host Continuous. Here’s how to do that:

This first step may or may not be crucial, I added it to my mix on the suggestion of this kubeadm doc page while I was puzzling over why the weave network plugin wasn’t working.

# cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

This set of steps adds my copr repo, overlays the needed packages, and kicks off a reboot for the overlay to take effect.

# cat <<EOF > /etc/yum.repos.d/jasonbrooks-kube-release-epel-7.repo
[jasonbrooks-kube-release]
name=Copr repo for kube-release owned by jasonbrooks
baseurl=https://copr-be.cloud.fedoraproject.org/results/jasonbrooks/kube-release/epel-7-x86_64/
type=rpm-md
skip_if_unavailable=True
gpgcheck=1
gpgkey=https://copr-be.cloud.fedoraproject.org/results/jasonbrooks/kube-release/pubkey.gpg
repo_gpgcheck=0
enabled=1
enabled_metadata=1
EOF

# rpm-ostree pkg-add --reboot kubelet kubeadm kubectl kubernetes-cni

These steps start the kubelet service, put selinux into permissive mode, which, according to this doc page should soon not be necessary, and initializes the cluster.

# systemctl enable kubelet.service --now

# setenforce 0

# kubeadm init --use-kubernetes-version "v1.4.3"

This step assigns the master node to also serve as a worker, and then deploys the weave network plugin on the cluster. To add additional workers, use the kubeadm join command provided when the cluster init operation completed.

# kubectl taint nodes --all dedicated-

# kubectl apply -f https://git.io/weave-kube

When the command kubectl get pods --all-namespaces shows that all of your pods are up and running, the cluster is ready for action.

The kubeadm tool is considered an “alpha” right now, but moving forward, this looks like it could be a great way to come up with an up-to-date kube cluster on atomic hosts. I’ll need to figure out whether my workaround to get kubernetes-cni working is sane enough before building a more official centos or fedora package for this, and I want to figure out how to swap out the project-built, debian-based kubernetes component containers with containers provided by centos or fedora, a topic I’ve written a bit about recently.

update: While the hyperkube containers that I’ve written about in the past were based on debian, the containers that kubeadm downloads appear to be built on busybox.

running kubernetes in containers on atomic

The atomic hosts from CentOS and Fedora earn their “atomic” namesake by providing for atomic, image-based system updates via rpm-ostree, and atomic, image-based application updates via docker containers.

This “system” vs “application” division isn’t set in stone, however. There’s room for system components to move across from the somewhat rigid world of ostree commits to the freer-flowing container side.

In particular, the key atomic host components involved in orchestrating containers across multiple hosts, such as flannel, etcd and kubernetes, could run instead in containers, making life simpler for those looking to test out newer or different versions of these components, or to swap them out for alternatives.

Suraj Deshmukh wrote a post recently about running kubernetes in containers. He wanted to test kubernetes 1.3, for which Fedora packages aren’t yet available, so he turned to the upstream kubernetes-on-docker.

Suraj ran into trouble with flannel and etcd, so he ran those from installed rpms. Flannel can be tricky to run as a docker container, because docker’s own configs must be modified to use flannel, so there’s a bit of a chicken-and-egg situation.

One solution is system containers for atomic, which can be run independently from the docker daemon. Giuseppe Scrivano has built example containers for flannel and for etcd, and in this post, I’m describing how to use these system containers alongside a containerized kubernetes on an atomic host.

setting up flannel and etcd

You need a very recent version of the atomic command. I used a pair of CentOS Atomic Hosts running the “continuous” stream.

The master host needs etcd and flannel:

# atomic pull gscrivano/etcd

# atomic pull gscrivano/flannel

# atomic install --system gscrivano/etcd

With etcd running, we can use it to configure flannel:

# export MASTER_IP=YOUR-MASTER-IP

# runc exec gscrivano-etcd etcdctl set /atomic.io/network/config '{"Network":"172.17.0.0/16"}'

# atomic install --name=flannel --set FLANNELD_ETCD_ENDPOINTS=http://$MASTER_IP:2379 --system gscrivano/flannel

The worker node needs flannel as well:

# export MASTER_IP=YOUR-MASTER-IP

# atomic pull gscrivano/flannel

# atomic install --name=flannel --set ETCD_ENDPOINTS=http://$MASTER_IP:2379 --system gscrivano/flannel

On both the master and the worker, we need to make docker use flannel:

# echo "/usr/libexec/flannel/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker" | runc exec flannel bash

Also on both hosts, we need this docker tweak (because of this):

# cp /usr/lib/systemd/system/docker.service /etc/systemd/system/

# sed -i s/MountFlags=slave/MountFlags=/g /etc/systemd/system/docker.service

# systemctl daemon-reload

# systemctl restart docker

On both hosts, some context tweaks to make SELinux happy:

# mkdir -p /var/lib/kubelet/

# chcon -R -t svirt_sandbox_file_t /var/lib/kubelet/

# chcon -R -t svirt_sandbox_file_t /var/lib/docker/

setting up kube

With flannel and etcd running in system containers, and with docker configured properly, we can start up kubernetes in containers. I’ve pulled the following docker run commands from the docker-multinode scripts in the kubernetes project’s kube-deploy repository.

On the master:

# docker run -d \
--net=host \
--pid=host \
--privileged \
--restart="unless-stopped" \
--name kube_kubelet_$(date | md5sum | cut -c-5) \
-v /sys:/sys:rw \
-v /var/run:/var/run:rw \
-v /run:/run:rw \
-v /var/lib/docker:/var/lib/docker:rw \
-v /var/lib/kubelet:/var/lib/kubelet:shared \
-v /var/log/containers:/var/log/containers:rw \
gcr.io/google_containers/hyperkube-amd64:$(curl -sSL "https://storage.googleapis.com/kubernetes-release/release/stable.txt") \
/hyperkube kubelet \
--allow-privileged \
--api-servers=http://localhost:8080 \
--config=/etc/kubernetes/manifests-multi \
--cluster-dns=10.0.0.10 \
--cluster-domain=cluster.local \
--hostname-override=${MASTER_IP} \
--v=2

On the worker:

# export WORKER_IP=YOUR-WORKER-IP

# docker run -d \
--net=host \
--pid=host \
--privileged \
--restart="unless-stopped" \
--name kube_kubelet_$(date | md5sum | cut -c-5) \
-v /sys:/sys:rw \
-v /var/run:/var/run:rw \
-v /run:/run:rw \
-v /var/lib/docker:/var/lib/docker:rw \
-v /var/lib/kubelet:/var/lib/kubelet:shared \
-v /var/log/containers:/var/log/containers:rw \
gcr.io/google_containers/hyperkube-amd64:$(curl -sSL "https://storage.googleapis.com/kubernetes-release/release/stable.txt") \
/hyperkube kubelet \
--allow-privileged \
--api-servers=http://${MASTER_IP}:8080 \
--cluster-dns=10.0.0.10 \
--cluster-domain=cluster.local \
--hostname-override=${WORKER_IP} \
--v=2

# docker run -d \
--net=host \
--privileged \
--name kube_proxy_$(date | md5sum | cut -c-5) \
--restart="unless-stopped" \
gcr.io/google_containers/hyperkube-amd64:$(curl -sSL "https://storage.googleapis.com/kubernetes-release/release/stable.txt") \
/hyperkube proxy \
--master=http://${MASTER_IP}:8080 \
--v=2

get current kubectl

I usually test things out from the master node, so I’ll download the newest stable kubectl binary to there:

# curl -sSL https://storage.googleapis.com/kubernetes-release/release/$(curl -sSL "https://storage.googleapis.com/kubernetes-release/release/stable.txt")/bin/linux/amd64/kubectl > /usr/local/bin/kubectl

# chmod +x /usr/local/bin/kubectl

test it

It takes a few minutes for all the containers to get up and running. Once they are, you can start running kubernetes apps. I typically test with the guestbookgo atomicapp:

# atomic run projectatomic/guestbookgo-atomicapp

Wait a few minutes, until kubectl get pods tells you that your guestbook and redis pods are running, and then:

# kubectl describe service guestbook | grep NodePort

Visiting the NodePort returned above at either my master or worker IP (these kube scripts configure both to serve as workers) gives me this:

fedora and docker storage

While (pretty much) everyone who’s using docker is running it on Linux, and while lots of people run docker on their laptops and desktops, most aren’t running it directly on Linux desktops and laptops. Instead, most individual docker users are relying on some sort of purpose-built Linux distribution running as a virtual machine on their Mac or Windows machine.

However, if you are (like me) running Linux on your desktop, you can run docker containers right on your bare metal, with no virtualization overhead in between. Yay, Desktop Linux!

But wait. If you are (like me) running Fedora Linux on your desktop, and if you (also like me) weren’t thinking about docker and its particular storage needs when you installed Fedora on your machine, you could be in for some perplexing issues or at least crap performance, because of the way that docker storage works on Fedora.

I’ve written about the general issue elsewhere:

…the AUFS backend that started out as Docker’s default storage option, but never made its way into the mainlain Linux kernel, posed a problem for Red Hat and our upstream first, no out-of-tree bits ways.

The settled-upon solution was device mapper thin provisioning, which takes a block storage device to create a pool of space that can be used to create other block devices for Docker containers and images. The device mapper backend can be configured to use direct LVM volumes or you can let Docker create a pair of loopback mounted sparse files to serve as the block devices.

from: Friends Don’t Let Friends Run Docker on Loopback in Production

When you install Fedora on your desktop or laptop, the installer divvies your entire disk up into a small boot partition and a big LVM partition, and then divides that LVM space up into a swap volume that varies in size based on how much RAM you have installed, a root volume of 50GB, and a home volume that takes over whatever’s left.

With no room left for the pool of space that the docker device mapper storage driver needs for containers and images, the storage driver will turn instead to crappily-performing loopback mounted files. Boo!

A Fix

You can cut back the size of the home volume in Fedora without too much trouble. I like to use system-storage-manager to work with my disks:

NOTE: I guess I should add that whenever you’re mucking with your disks, you should make sure you have backups, and so on, but I have resized my own laptop partitions in just this way on more than one occasion, and I’ve tested the steps written here with a VM as I wrote this, so, yeah.

$ sudo dnf install -y system-storage-manager

Next, reboot your machine, and when you get back to the login screen, hit CTRL-ALT-F2 to get to a virtual terminal, and then log in as root. We need to do this in order to unmount the home directory before we shrink it. As root, you can use system-storage-manager to shrink down the home volume. Below I’m shrinking the home volume to 20G, because I’m testing these instructions on a VM with a 100GB drive. Substitute a value that makes sense for your rig.

# umount /home
# ssm resize -s 20G /dev/fedora/home
# reboot

If you’ve already installed and run docker, you’ll need to delete /var/lib/docker, where all of your containers and volumes live, so be prepared to rebuild those.

$ sudo systemctl stop docker
$ sudo rm -rf /var/lib/docker
$ sudo systemctl start docker

When docker starts up again, a script that comes bundled with Fedora’s distribution of docker will check to see that there’s space available in your volume group and will set up your storage correctly. If you want to grow your home volume later, it’s easy to do and doesn’t require unmounting anything. You’ll run the same ssm resize command from above, and swap in your desired volume size.

NOTE: If you’re using docker-engine from docker.com, check out these docs for setting up devicemapper driver correctly by hand.

Starting out right

If you haven’t yet installed Fedora, you can configure your system to accommodate this and other LVM storage scenarios moving forward by making your home volume smaller and modifying your “fedora” volume group from its default “Automatic” size policy to the “As large as possible” policy. This way, all your spare disk space will be ready for new volumes (such as the docker thin pool) or for growing your home or root volumes if you decide that you need the space later on. This is probably how Fedora partitioning should be configured by default, anyway, but it isn’t.

Looking ahead

Finally, there’s another option on the horizon for docker storage on Fedora, an option that doesn’t require partition changes or planning: OverlayFS. I wrote about this in the post I linked above, too, but the TLDR is that OverlayFS and SELinux don’t work together yet, although that’s set to change. Stay tuned.