Install Docker CE on Fedora Atomic Host (if that’s what you’re into)

Fedora Atomic Host comes bundled with a version of Docker based on this project atomic repo that moves no faster than the upstream Kubernetes project can abide (currently docker-1.13.1). This means that Fedora Atomic pretty much always ships with an older version of docker than what’s available from Docker Inc.

However, through the magic of rpm-ostree package layering, you can replace that older, baked-in docker with the very latest docker-ce. Here’s how:

First, grab the repo file for docker-ce.

# cd /etc/yum.repos.d/
# curl -O https://download.docker.com/linux/fedora/docker-ce.repo

Then create a config file to tell docker-ce to use overlay2 storage.

# vi /etc/docker/daemon.json

{
  "storage-driver": "overlay2"
}

Then, use rpm-ostree ex override to remove docker and kubernetes from the image, and use rpm-ostree install to layer on docker-ce from the configured repo.

# rpm-ostree ex override remove docker docker-common kubernetes kubernetes-node cockpit-docker
# rpm-ostree install docker-ce -r

After the reboot, you’ll have the latest docker-ce installed. Knock yourself out with any number of bleeding-edge features!

# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 17.09.0-ce
Storage Driver: overlay2
...

Trying Out a New Path to Kubernetes: Kubespray

I just came across this Little Guide to Kubernetes Install Options, which covers a few options I’ve heard of, and a few options I haven’t heard of. It doesn’t mention the main way that I deploy Kubernetes, which is through the Ansible scripts from the kubernetes/contrib repository. The post does point to another Ansible-based option, though, and I wondered whether this one, called Kubespray (nee Kargo) would work with Atomic Hosts.

I installed kubespray:

$ sudo pip2 install kubespray

I generated an inventory for a baremetal (actually VMs) cluster with one etcd host / kube master and two nodes:

$ kubespray prepare --nodes node2[ansible_ssh_host=cah-2.osas.lab] node3[ansible_ssh_host=cah-3.osas.lab] --etcds node1[ansible_ssh_host=cah-1.osas.lab] --masters node1[ansible_ssh_host=cah-1.osas.lab]

I deployed the cluster, providing the argument -u root because my ansible host was already set up to access my test VMs as root via ssh key:

$ kubespray deploy -u root

The ansible zoomed by, eventually ending with:

PLAY RECAP *********************************************************************
localhost : ok=3 changed=1 unreachable=0 failed=0
node1 : ok=393 changed=95 unreachable=0 failed=0
node2 : ok=333 changed=76 unreachable=0 failed=0
node3 : ok=303 changed=65 unreachable=0 failed=0

Kubernetes deployed successfuly

I tested the cluster by deploying the guestbook go sample app, as is my custom, and sure enough, everything seemed to be working.

The biggest difference between this installation route and the one I usually take is the source of the containers. Where I typically run CentOS Atomic with Kubernetes rpms from the CentOS project or with containers based on those rpms, and the same with Fedora Atomic and Fedora-based content, the Kubespray installer set me up with container images mostly from CoreOS:

[root@cah-1 ~]# atomic containers list
CONTAINER ID IMAGE COMMAND CREATED STATE BACKEND RUNTIME
19d6514ceb1a quay.io/coreos/hyper /hyperkube controlle 2017-08-11 18:54 running docker docker
47bb6f63af38 gcr.io/google_contai /pause 2017-08-11 18:54 running docker docker
2102af0a5915 quay.io/coreos/hyper /hyperkube scheduler 2017-08-11 18:54 running docker docker
8af0c87bcfbd gcr.io/google_contai /pause 2017-08-11 18:54 running docker docker
c91bf4d9c687 quay.io/coreos/hyper /hyperkube apiserver 2017-08-11 18:54 running docker docker
96bc198022ac gcr.io/google_contai /pause 2017-08-11 18:54 running docker docker
e5cedfe5145e calico/node:v1.1.3 start_runit 2017-08-11 18:53 running docker docker
a31b6a04be23 quay.io/coreos/hyper /hyperkube proxy --v 2017-08-11 18:52 running docker docker
877aa10ab6a4 gcr.io/google_contai /pause 2017-08-11 18:52 running docker docker
b9f64835b7e5 quay.io/coreos/hyper ./hyperkube kubelet 2017-08-11 18:52 running docker docker
1bab52292b2d quay.io/coreos/etcd: /usr/local/bin/etcd 2017-08-11 18:48 running docker docker

It’s not a big deal swapping out one container source for another, however. Fedora and CentOS aren’t providing a hyperkube container, which is what kubespray (and kubeadm, for that matter) look to use, but we could create one for Fedora and CentOS based on the upstream Dockerfile.

Recent Adventures in oVirt and Gluster

At the end of last week, I spied an exciting tweet about oVirt:

libgfapi-ready

Not long after I started using oVirt and Gluster together, the projects started talking about a way to improve Gluster performance by enabling virtualization hosts to access Gluster volumes directly, using Gluster’s libgfapi, rather than through a FUSE-mounted location on the virtualization host. There was a little bit of fit and finish work to be done, and then we’d all be basking in the glow of ~30% better Gluster storage performance.

That was about four years ago. There ended up being kind of a lot of different little things that needed fixing to make this feature work in oVirt. You can follow many of the twists and turns in bugzilla.

All along, I was eagerly awaiting the feature both as a cool new oVirt+Gluster development and as a welcome option for speeding up my own lab. Disk has always been the weakest part of my hardware setup. My servers each have a single pair of 1TB drives in mirrored RAID, shared between Gluster and the OS, and my VM’s virtual drives had been stored in triplicate in replica 3 Gluster volumes. More recently, with the advent of Gluster arbiter bricks, I’ve been able to get the split-brain protection of replica 3 volumes with only two copies of the data, and that sped things up a bit, but did nothing to dampen my appetite for libgfapi.

Since I need my oVirt setup to get things done, I usually don’t test RC versions of new oVirt components there, but I couldn’t wait any longer and took the plunge. I installed the RC2 updates on each of my virt hosts, and on my engine, I installed a slightly newer versionof the code, from the experimental repo, which contained a few last bits that hadn’t made RC2. Then, on my engine, I ran:

# engine-config -s LibgfApiSupported=true
# systemctl restart ovirt-engine

Any VMs that were already running before the upgrade continued running without libgfapi, and if I migrated them to another host, they’d turn up on that host still using the old access method. When I restarted my VMs, they returned using libgfapi. I could tell which was which by grepping through the qemu processes on a particular VM host.

# ps ax | grep qemu | grep 'file=gluster\|file=/rhev'

-drive file=/rhev/data-center/00000001-0001-0001-0001-00000000025e/616be2b6-71db-4f54-befd-be6a444775d7/images/3f7877e7-e532-44a0-8735-c7b2ca06de3b/48ee34fc-ae12-494c-892f-4229fe1fef9d

-drive file=gluster://10.0.20.1/data/616be2b6-71db-4f54-befd-be6a444775d7/images/6597f45a-51cd-4da5-b078-a2652baf78e4/cc3a575e-27b8-4176-b922-9466273153be

The qemu command lines are super long, so I cut them down just to include the line specifying the virtual drives. In the first example, the drive is being accessed through a FUSE mount, and the second, there’s a direct connection to the Gluster volume.

So, how was performance?

I tried a few different tests, starting with runningddon one of my VMs:

# dd bs=1M count=1024 if=/dev/zero of=test conv=fdatasync && rm test

I ran this a bunch of times on a VM in both storage configurations and the libgfapi configuration came out about 44% faster on average.

For a more “real world” test, I figured I’d measure the time it takes to complete a common task of mine: configuring a test Kubernetes cluster from three Fedora Atomic Host VMs using the upstream ansible scripts. I recorded and averaged the time it took to complete this task across multiple runs on VMs running in each storage configuration, and found that libgfapi was 11% faster.

zram madness

Not too bad, but like I said earlier, my oVirt setup can use all the storage speed help it can get. My servers don’t have a lot of disk but they do have quite a bit of RAM, 256GB apiece, so I’ve long wondered how I could use that RAM to wring more speed out of my setup. For a few months I’ve been experimenting with using Gluster volumes backed by RAM-disks, using zram devices.

This actually works pretty well, and I was seeing speeds similar to what I get running on the SSD in my laptop. Of course, RAM-disks mean losing everything on the disk in the event of a reboot (expected or otherwise), but using replica 3 Gluster volumes, I could reboot one host at a time without losing everything else. Upon bringing back the rebooted host, I’d run a little script to recreate the zram device and the mount points, and then follow the Gluster instructions for replacing a failed brick.

# cat fast.sh
ZRAMSIZE=$((1024 * 1024 * 1024 * 50))
modprobe zram
echo ${ZRAMSIZE} > /sys/class/block/zram0/disksize
mkfs -t xfs /dev/zram0
mkdir -p /gluster-bricks/fast
mount /dev/zram0 /gluster-bricks/fast
mkdir /gluster-bricks/fast/brick

However, if all of my machines went down at once, due to a power failure in the lab or something like that, replication wouldn’t help me. I wondered if I could still get a significant boost out of a mixture of zram and regular disk backed volumes, with each of my servers hosting one zram-backed brick, one regular disk-backed brick, and one regular disk-backed arbiter brick, all combined into one distributed-replicated Gluster volume.

brick-house

I ran my same ansible-kubernetes setup tests with the VM drives hosted from my “fast” Gluster domain, and the tests run 32% faster than with the my regular disk-backed (and now libgfapi-enabled) “data” storage domain. Pretty nice, and, in this sort of setup, a power loss would mean that each of four replica groups would be missing one brick, with a remaining data brick and an arbiter brick still around to maintain the data and allow me to repair things.

I want to experiment a bit further with automated tiering in Gluster, where I’d connect a RAM-disk boosted volume like this to the volume for my main data domain, and frequently-accessed files would automatically migrate to the faster storage. As it is now, my fast domain has to be relatively small, so I have to budget my use of it.

testing system-containerized kube and friends

A month or so ago I jotted down some notes on using ansible to set up a kubernetes cluster on atomic hosts with kubernetes running in regular docker containers and flannel and etcd running in system containers.

I’ve been working on turning my kube containers into system containers. Three reasons jump to mind:

  • I want to run my kube containers via systemd, and system containers come with systemd unit files rolled in and deployed automatically when you run atomic install --system foo, as opposed to storing them somewhere separate from the containers, and copying them into place.
  • I’m using flannel and etcd system containers, in part because flannel needs to modify docker’s configs to do its thing, and etcd needs to be running for flannel to run, so there’s a bit of a chicken-and-egg situation that we avoid by running flannel and etcd outside of docker. I can save on a bit of storage by having flannel, etcd and kubernetes all share the same image in the ostree-based storage that system containers use.
  • I’ve been wanting to learn more about system containers for a little while now, and Yu Qi (Jerry) Zhang just wrote this system container howto.

I’ve been testing on a trio of fedora atomic hosts like this:

$ git clone https://github.com/jasonbrooks/contrib.git
$ cd contrib
$ git checkout system-containers
$ cd ansible
$ vi inventory/inventory

[masters]
kube-master-test.example.com

[etcd:children]
masters

[nodes]
kube-minion-test-[1:2].example.com

$ cd scripts
$ ./deploy-cluster.sh

Substitute those hostnames above with ones that match your own test machines. Alternatively, you should be able to use the Vagrantfile in the vagrant directory of that repo, though I haven’t tested that yet.

This involves a bunch of changes to run commands like atomic install --system --name etcd {{ container_registry }}/{{ container_namespace }}/etcd:{{ container_label }} to install flannel, etcd and kubernetes master and node components if desired and specified in the inventory/group_vars/all.yml file.

In that same config file, I’ve temporarily turned off some of the newish encrypted flannel stuff, because I need to tweak the flannel container to make it work.

If you run the script as laid out above, you’ll get etcd, flannel and kube containers from my namespace in the docker hub, because the current upstream fedora containers, in the case of etcd and flannel, need a couple of changes, and in the case of kube, the upstream fedora containers (that I maintain) aren’t yet modified to run as system containers.

Speaking of which, another cool thing about system containers is that they can be run as regular docker containers. To test whether my new system containers would run as regular docker containers, I ran through the steps I mentioned in my previous post, with a different branch of ansible modded to run kube in regular docker containers, but in the all.yml conf file, I set container_registry: docker.io and container_namespace: jasonbrooks and container_label: fc25 to grab the system container versions of everything that I’ve been talking about in this post. It worked.

So, yay. I have a couple items to work through still. There’s the flannel bit I mentioned above (I think I just need to mount another dir in the flannel system container’s config.json.template). Also, I’ve been needing to restart the kubelet service again in my nodes before the kubedns pod would work, so I need to track down where in the ansible that needs to happen to make it automatic.

getting stuff done with a local openshift origin instance

A few of the projects I work with use static websites based on middleman, which you can run locally to see how your edits, or those of others, will look on the live site when they’re merged.

Each of these sites defaults to port 4567 when running locally, so if I’m running more than one of them at a time, they complain that their favored port is already taken. It’s easy enough to fire up middleman on a different port, but I thought I’d try and run a couple of these in containers, using a local instance of OpenShift Origin, a Kubernetes-based container application platform.

It’s pretty easy to get up and running with an OpenShift Origin instance using the command oc cluster up. The oc client is available for Linux, Windows and Mac OS. Since containers (pretty much) are Linux, you’ll need a Linux VM on Mac or Windows, but the oc client can use docker machine to take care of that for you. I haven’t tested that, though, because I use Linux already.

On Fedora, I followed these instructions, with the exception of installing the oc client from the Fedora repos (dnf install -y origin-clients), rather than downloading the binary from GitHub.

I wanted my origin install to persist across restarts, so I created a folder in my home directory to store persistent data, and started up my instance with:

$ sudo oc cluster up --host-data-dir=/home/jbrooks/origin-data --use-existing-config

sudo was necessary because I haven’t set up my regular user account to run docker without it — not a big deal, but some config files for logging in to my origin instance as admin ended up in my /root directory instead of my home directory, so I copied those over:

$ sudo cp -r /root/.kube ~/.
$ sudo chown -R jbrooks:jbrooks ~/.kube

I logged into the OpenShift web console using the URL and the developer:developer user name and password output by the oc cluster up command, clicked “Add to Project”, and then, under the “Languages” heading, chose “Ruby,” and then “Ruby 2.3”, because middleman is a ruby affair.

I filled in a name, pasted in the git repository URL for the ovirt middleman site, and hit “Create.”

I headed to the “Overview” page, saw that my build was running, clicked “View Log,” and saw that a familiar-looking build process was chugging along.

When the build finished, OpenShift kicked off a deployment of my image, which I could see from the deployment log linked from the overview page, was erroring out.

After some poking around, I fixed the issue by heading to the deployments section of the web console and, after first pausing the deployment, hitting the edit YAML button. I used the YAML editor to add a command right in between the image and ports sections of the configuration.

I also changed the containerPort from a default of 8080 to the middleman default of 4567. I expected this change to filter down to the service and route that were automatically created for me, but they didn’t — it wasn’t tough to edit those via the web console, however.

I added GIT_COMMITTER_NAME and GIT_COMMITTER_EMAIL environment variables to my deployment, from an “Environment” tab in the deployments area of the console. As I eventually learned, git got grumpy about running as a random UID (as is OpenShift’s security-conscious custom) rather than as a “real” user with an entry in /etc/passwd, but adding those ENV variables calmed git down.

Once I had a pod up and running, I was able to view the development site in my web browser via the URL provided in the routes section of the console.

Next, I headed to my terminal to log into my running pod with OpenShift’s oc rsh command, and fetch and check out a pending pull request on the ovirt site:

$ oc rsh ovirt-site-2-4-50eao

$ git fetch origin pull/877/head:pr-ovirt-gluster-411

$ git checkout pr-ovirt-gluster-411

The middleman development server handles live reloading, so once I checked out the new branch, it refreshed, and I could see my awaiting-merge blog post:

This works, but I’ll probably hone the process some more from here. I experimented a bit with using kompose to put together a simple docker compose-formatted manifest for my app that could either pull from an openshift-built or a built-elsewhere docker container. Like this:

version: "2"

services:  
  ovirt-site:
    image: 172.30.24.24:5000/myproject/ovirt-site
    ports:
      - "4567"
    environment:
      - GIT_COMMITTER_NAME="Jason Brooks"
      - GIT_COMMITTER_EMAIL="jbrooks@redhat.com"
    entrypoint:
      - scl
      - enable
      - rh-ruby23
      - /opt/app-root/src/run-server.sh
    labels:
      kompose.service.type: NodePort

I think that that approach would then work for a regular kube cluster or, with some tweaking, probably, docker or docker swarm as well.

test containerized kube and system container-based flannel and etcd

$ git clone https://github.com/jasonbrooks/contrib.git
$ cd contrib
$ git checkout atomic-update
$ cd ansible
$ vi inventory/inventory

[masters]
kube-master-test.example.com

[etcd:children]
masters

[nodes]
kube-minion-test-[1:2].example.com

$ cd scripts
$ ./deploy-cluster.sh

This will fail (if you use hostnames) at: TASK [flannel : Load the flannel config file into etcd] because we need this PR in the Fedora etcd system container. You can work around by sshing into your master, and editing the resolv.conf inside of your etcd system container to match the host, exiting, and re-running the script.

$ ssh root@kube-master-test.example.com
# vi /var/lib/containers/atomic/etcd/rootfs/etc/resolv.conf
# exit
$ ./deploy-cluster.sh

That should work.

This involves a bunch of changes to use docker containers for kube and use system containers for flannel and etcd. You can specify the registry, namespace and tag to use, as well as whether or not to containerize the master bits, the node bits, the etcd or the flannel using these extra options I’ve added to inventory/group_vars/all.yml:

container_registry: candidate-registry.fedoraproject.org
container_namespace: f25
container_label: latest

containerized_master: true
containerized_node: true

etcd_spc: true
flannel_spc: true

Paying for the News

I’ve been paying extra attention to the news these days, because of the election, so I’ve been having lots of interactions with the Washington Post’s “You Have X Free Articles Left This Month” subscription nag screens, and the similar ones from the New York Times. Sometimes, I ridiculously pause before clicking on a link, wondering whether I have free articles left and whether I should click.

When I find myself clicking on links to my hometown San Francisco Chronicle, it’s usually for Giants or Warriors beat reporting, but the Chronicle doesn’t offer any free articles at all.

I agree with the idea of paying for the news, and I’ve considered subscribing to the Washington Post a few times during the election season, but I always ask myself, “why should I subscribe to some East Coast newspaper, when I want to support and consume local news?”

The trouble is, I subscribed to the digital edition of the San Francisco Chronicle for several months last year, and I didn’t like it. I found the local reporting thin, and the rest of it substandard. I liked the sports reporting well enough, but my overall takeaway was: I don’t like this product and I don’t want to pay for it anymore. So I stopped paying for it.

What I’d like is a way to subscribe to a service that’d give me access to multiple newspapers. The service could track which ones I read the most and divvy up the funds appropriately. That way, the pubs with more engaging content would end up with more of my dollars.

One problem with a service like this might be that there’s too little money to go around as it is, and each subscriber would probably end up sending less money to each publication. The key would be bringing in lots of new people, like me, who don’t already subscribe.

I just looked up the annual cost for subscriptions to these five newspapers in which I have some level of interest.

Newspaper Annual Subscription
SF Chronicle $99
Washington Post $99
NY Times $195
LA Times $103
Mercury News $130

These newspapers are each asking around $10 a month for their digital subscriptions. I imagine I’d be willing to pay around three times that for a meta-subscription — give or take, depending on the participating pubs.

Update: I ended up buying one-year subscriptions to the SF Chronicle and to the Washington Post.

WordPress is not delighting me, followup

Followup to my post yesterday about WordPress, me, and insufficient delight.

I mentioned that my editor fonts look crappy. I noticed that as of version 4.6, the dashboard is supposed to take “advantage of the fonts you already have, making it load faster and letting you feel more at home on whatever device you use.” It may be doing that for fonts outside of the HTML editor tab, but for that tab, it isn’t using my chosen monospace font. I mentioned that I could probably fix this with Stylish, and I just did, and life’s a lot better now.

I mentioned that my markdown text is getting converted to HTML, which I really dislike. The WordPress.com account on Twitter kindly replied to tell me that this is a feature, not a bug.

I couldn’t find a mention of this change in any of the past years’ changelogs, and the doc page for WordPress markdown disagrees, but maybe it’s just in need of an update.

I mentioned that the media manager wasn’t surfaced in the new UI, and that that’s how I’d been uploading images to include in my posts. WordPress.com pointed out on Twitter that I can use the Add Media button in the editor…

But, there’s no Add Media button in the HTML tab of the editor, which is where I edit my markdown, which, I guess, will automatically convert at some future point to HTML anyway, so…

However, I realized that I can use the Set Featured Image button in the sidebar to upload an image, copy its URL, uncheck the featured image checkbox, cancel out of the dialog, and then paste that URL into my post, and that works.

Anyway, I let my annual premium subscription auto-renew about a month and a half ago, so I’m out of the refund window, so I’ll probably stick around, although this markdown to HTML autoconvert misfeature is pretty distressing. Worst case scenario, I’m supporting open source software, so there’s that.

WordPress is not delighting me

I’ve switched blog engines from WordPress to Middleman (a static website engine) and back to WordPress, with various other static engine experiments in between.

I switched back to WordPress, on a premium subscription, because WordPress started supporting markdown, which I like, and because WordPress is open source software (with open source comments support), which I also like. What’s more, paying for hosting through Automattic means not having to mess with WordPress updates myself, and means helping to support a legit open source software company, and I’m into both of those, big time.

BUT. I’m not totally delighted with WordPress. It has to do, mostly, with editor issues.

First, fonts in the editor look crappy, and I can’t figure out how to change that. It’s this low-contrast bullshit that you see everywhere these days. I mean, I’m sure I could use something like Stylish to mod the way the fonts in the editor look so that I can actually use it comfortably, but… that shouldn’t be necessary, right?

Fonts are nicer-looking in the “visual” tab, but as I mentioned, I’m writing in markdown. I avoid writing in HTML unless I can’t avoid it. And even clicking into the visual tab tempts the specter of…

Markdown Reverts to HTML.

UGH! I freaking hate when this happens. Here and there, for reasons I don’t understand, and I’ve been too annoyed about the whole thing to patiently test it out, some of the posts I’ve written in markdown transform into HTML:

That’s a screenshot of the revisions feature, that I use to convince myself that I’m not crazy and that I really did write my post in markdown. I can use this feature to revert to before WordPress crappified my text into HTML, but, the revisions feature is only surfaced in the “old” UI, and that old UI is full of nags about using the “new” UI instead.

The new UI is cleaner-looking, but is missing a bunch of stuff, like links to media management, which I need to upload pictures that I want to include in my posts, pictures that I can store in the storage space that I’m paying for as part of my premium subscription. Which brings to mind…

Upsell messages about the business-class service tier. The preview feature used to include little buttons for toggling between web, tablet and mobile site previews, but that screen now includes a fourth button, labelled SEO, and clicking it brings up an ad for the $25 per month business tier of WordPress service.

And of course, customization is a bit of a PITA. WordPress themes are legion, and there isn’t a nice way to filter searches for these by things like theme features supported, so there’s a ton of trial and error involved in finding a theme that’ll work for you.

My big issue has been finding themes with proper support for the “link” post type, and by proper I mean that if I include a link post, I expect the headline to link straight through to the final source, not to a stupid little stub page (I hate it when sites do this) and I want my rss entry to link straight through, too.

The theme I’m using now works this way, but I wish a couple of little things with the site were a bit different, and I know that much is doable via custom CSS, but:

And I’m disheartened to Google for WordPress solutions only to find these sad five year old posts asking similar questions, often unanswered, and when there are answers, they very often involve installing plugins, which you can’t do with hosted WordPress, and which are probably buggy and out of date, anyway.

BUT, whatever, if markdown worked well, and it really ought to, and if the editor got some more love, which… I don’t know, maybe it has gotten love, just not from anyone who actually uses the editor, at least in along with markdown, if these little editing bits were working better, I’d probably be pretty happy, and I imagine I’ll someday beat my CSS demons to figure out the rest.

Installing Kubernetes on CentOS Atomic Host with kubeadm

Version 1.4 of Kubernetes, the open-source system for automating deployment, scaling, and management of containerized applications, included an awesome new tool for bootstrapping clusters: kubeadm.

Using kubeadm is as simple as installing the tool on a set of servers, running kubeadm init to initialize a master for the cluster, and running kubeadm join on some nodes to join them to the cluster. With kubeadm, the kubelet is installed as a regular software package, and the rest of the components run as docker containers.

The tool is available in packaged form for CentOS and for Ubuntu hosts, so I figured I’d avail myself of the package-layering capabilities of CentOS Atomic Host Continuous to install the kubeadm rpm on a few of these hosts to get up and running with an up-to-date and mostly-containerized kubernetes cluster.

However, I hit an issue trying to install one of the dependencies for kubeadm, kubernetes-cni:

# rpm-ostree pkg-add kubelet kubeadm kubectl kubernetes-cni
Checking out tree 060b08b... done

Downloading metadata: [=================================================] 100%
Resolving dependencies... done
Will download: 5 packages (41.1 MB)

Downloading from base: [==============================================] 100%

Downloading from kubernetes: [========================================] 100%

Importing: [============================================================] 100%
Overlaying... error: Unpacking kubernetes-cni-0.3.0.1-0.07a8a2.x86_64: openat: No such file or directory

It turns out that kubernetes-cni installs files to /opt, and rpm-ostree, the hybrid image/package system that underpins atomic hosts, doesn’t allow for this. I managed to work around the issue by rolling my own copy of the kubeadm packages that included a kubernetes-cni package that installed its binaries to /usr/lib/opt, but I found that not only do kubernetes’ network plugins expect to find the cni binaries in /opt, but they place their own binaries in there as needed, too. In an rpm-ostree system, /usr is read-only, so even if I modified the plugins to use /usr/lib/opt, they wouldn’t be able to write to that location.

I worked around this second issue by further modding my kubernetes-cni package to use tmpfiles.d, a service for managing temporary files and runtime directories for daemons, to create symlinks from each of the cni binaries stored in /usr/lib/opt/cni/bin to locations in in /opt/cni/bin. You can see the changes I made to the spec file here and find a package based on these changes in this copr.

I’m not positive that this is the best way to work around the problem, but it allowed me to get up and running with kubeadm on CentOS Atomic Host Continuous. Here’s how to do that:

This first step may or may not be crucial, I added it to my mix on the suggestion of this kubeadm doc page while I was puzzling over why the weave network plugin wasn’t working.

# cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

This set of steps adds my copr repo, overlays the needed packages, and kicks off a reboot for the overlay to take effect.

# cat <<EOF > /etc/yum.repos.d/jasonbrooks-kube-release-epel-7.repo
[jasonbrooks-kube-release]
name=Copr repo for kube-release owned by jasonbrooks
baseurl=https://copr-be.cloud.fedoraproject.org/results/jasonbrooks/kube-release/epel-7-x86_64/
type=rpm-md
skip_if_unavailable=True
gpgcheck=1
gpgkey=https://copr-be.cloud.fedoraproject.org/results/jasonbrooks/kube-release/pubkey.gpg
repo_gpgcheck=0
enabled=1
enabled_metadata=1
EOF

# rpm-ostree pkg-add --reboot kubelet kubeadm kubectl kubernetes-cni

These steps start the kubelet service, put selinux into permissive mode, which, according to this doc page should soon not be necessary, and initializes the cluster.

# systemctl enable kubelet.service --now

# setenforce 0

# kubeadm init --use-kubernetes-version "v1.4.3"

This step assigns the master node to also serve as a worker, and then deploys the weave network plugin on the cluster. To add additional workers, use the kubeadm join command provided when the cluster init operation completed.

# kubectl taint nodes --all dedicated-

# kubectl apply -f https://git.io/weave-kube

When the command kubectl get pods --all-namespaces shows that all of your pods are up and running, the cluster is ready for action.

The kubeadm tool is considered an “alpha” right now, but moving forward, this looks like it could be a great way to come up with an up-to-date kube cluster on atomic hosts. I’ll need to figure out whether my workaround to get kubernetes-cni working is sane enough before building a more official centos or fedora package for this, and I want to figure out how to swap out the project-built, debian-based kubernetes component containers with containers provided by centos or fedora, a topic I’ve written a bit about recently.

update: While the hyperkube containers that I’ve written about in the past were based on debian, the containers that kubeadm downloads appear to be built on busybox.