More Fun with Kubeadm & Fedora

I recently wrote about getting up and running with kubeadm and Fedora CoreOS, which I got working, but which sent me into a miniature funk of uncertainly over various little integration issues.

First, I was getting around the lack of support in rpm-ostree for rpms that place stuff in /opt, which isn’t a traditional place for package managers to put stuff, but which is where kubeadm puts its cni binaries, for historical reasons. I got the Fedora package that provides the cni binaries, containernetworking-plugins, and that doesn’t stick things into /opt, modified to say it provides kubernetes-cni, which is what the upstream kubernetes rpm maintainers call it, but I had to transgress against rpmlint by leaving out the version number. The upstream packagers call explicitly for cni version 0.6.0, while Fedora is shipping version 0.7.4.

As far as I could tell, the later version worked just fine, but I wasn’t sure I’d get my package change merged while telling that lie of omission. That led me to wonder about whether I should try to convince the upstream packagers to move the cni binaries — kubernetes is hard coded to look for them in opt, but you can specify a different location when you’re setting things up, so getting the binaries moved to /usr/libexec/cni, where Fedora keeps them, could be an option. Or, I’ve played with some symlink-type trickery in the past to make cni binaries appear under /opt while actually installed elsewhere, so maybe I could convince the project to accept something like that.

However, Fedora’s cri-o package depends on at least cni 0.7.x, so achieving nicer (or, failing that, trickier) kubernetes-cni packaging to allow for installation on rpm-ostree hosts would mean incompatibility with the runtime I was interested in using, so what’s the point of anything, anyway, even?

Well, my change did get merged to the Fedora package (you can test the package and give it karma) and those newer cni binaries do appear to work just fine with the upstream kubelet, so I’m feeling somewhat better about that part. Also, I think I agreed to co-maintain the containernetworking-plugins package moving forward, so that’s fun.

Elsewhere on the problematic networking front, I had to insert this puzzled passage into my last post:

…I found during my tests that my Fedora CoreOS host was configuring a 10.88.0.1/16 address on the cni0 interface, which was conflicting with my flannel networking. I found that if I deleted that address from the device, the cni0 interface would get a new, 10.244.0.1/24 address that worked for my cluster:

sudo ip addr del 10.88.0.1/16 dev cni0

I wasn’t sure where that was coming from, though I had some vague sense that I once knew the answer. I relearned / remembered that this is part of the config for cri-o, and lives in the file /etc/cni/net.d/100-crio-bridge.conf, but then disappears, seemingly following the next reboot after I’ve configured a cluster. I thought that perhaps I should be passing 10.88.0.1/16 as the cidr argument when running kubeadm init, instead of the conventional 10.244.0.1/16, but maybe not. I need to do more poking here.

Speaking of reboots, I ran into a problem in which cri-o wasn’t dealing very well with reboots — as expected, all the containers running under cri-o were going away during a reboot, but this should be no big deal, because kubernetes can restart its control plane containers from kubelet manifest files, and start up anything else from its records in etcd. I found, though, that following a reboot, the kubelet was complaining about how a sandbox for the pod it was trying to run already existed, even though cri-o wasn’t running any pods. I found a reported issue that looked similar and contained a workaround, but ended up solving the issue by figuring out how to update cri-o to a later version…

One of qualities that sets cri-o apart from other kubernetes container runtime options is that each cri-o version is pegged to a particular kubernetes version — there’s a cri-o 1.12 for kubernetes 1.12, a 1.13 for 1.13, and so on. As I write this, the latest upstream version of kubernetes is 1.13.4, and that’s the version of the kubelet, kubectl and kubeadm I’ve been running on my Fedora test VMs. However, the current version of cri-o shipping with Fedora is 1.12.0. It seemed to be working (that reboot issue notwithstanding), but I wanted to be running 1.13.

Poking around in Fedora’s updates system, I found that cri-o 1.13 was available as a test package, but was now being packaged as a Fedora module. Fedora Modularity is a big topic, but the bottom line is that it allows for more flexibility in packaging and maintaining software in the Fedora family. In this case, it offers a way out of the one stable version per Fedora release that doesn’t fit well with packages like cri-o, which will have multiple stable versions out in the world at once. The trouble, potentially, was that I wasn’t sure whether module-based rpms would play nicely with rpm-ostree and package layering.

As it turned out, I simply had to enable the repo by changing enabled=0 to enabled=1 in /etc/yum.repos.d/fedora-updates-testing-modular.repo (moving forward, I’d like to see rpm-ostree grow support for enabling repos per-operation, a la yum and dnf), and then install the package using rpm-ostree install cri-o. On the test host I was already using, where I’d already installed cri-o, I had to take another step — I should have been able to run rpm-ostree upgrade -r to fetch any available image updates alongside any updates to my layered packages, but since Fedora CoreOS is still in experimental preview mode, its configured ostree remote doesn’t point anywhere, which leads rpm-ostree upgrade to error out before grabbing any layered package upgrades. Instead, I had to run rpm-ostree uninstall cri-o && rpm-ostree install cri-o to uninstall and then reinstall the newer version.

I tested out a kubeadm cluster with the updated cri-o, and it worked, so I left some karma for the package. When I started typing up my notes about the reboot issue, I figured I’d reboot again to see if the new version was behaving the same way, and the problem disappeared. I don’t know if something about the mismatch between kube 1.13 and cri-o 1.12 was to blame, but I was happy not to see the issue any more.

Next up, I want to play with the HA kubeam docs — I’m thinking I’ll set up a three master/etcd node, three worker node cluster on one of my oVirt clusters, team it up with the oVirt volume provisioner and flexvolume driver for persistent volume support, and then install KubeVirt with nested kvm for some tests of that. I’m interested to see how this Fedora CoreOS / kubeadm cluster fares over time through some package and image upgrades while running some longish-lived workloads. Since the ostree repo isn’t up yet during the experimental preview, I suppose I’ll be sprucing up these BYOAtomic docs to compose and host my own repo.

Stay tuned.

testing system-containerized kube and friends

A month or so ago I jotted down some notes on using ansible to set up a kubernetes cluster on atomic hosts with kubernetes running in regular docker containers and flannel and etcd running in system containers.

I’ve been working on turning my kube containers into system containers. Three reasons jump to mind:

  • I want to run my kube containers via systemd, and system containers come with systemd unit files rolled in and deployed automatically when you run atomic install --system foo, as opposed to storing them somewhere separate from the containers, and copying them into place.
  • I’m using flannel and etcd system containers, in part because flannel needs to modify docker’s configs to do its thing, and etcd needs to be running for flannel to run, so there’s a bit of a chicken-and-egg situation that we avoid by running flannel and etcd outside of docker. I can save on a bit of storage by having flannel, etcd and kubernetes all share the same image in the ostree-based storage that system containers use.
  • I’ve been wanting to learn more about system containers for a little while now, and Yu Qi (Jerry) Zhang just wrote this system container howto.

I’ve been testing on a trio of fedora atomic hosts like this:

$ git clone https://github.com/jasonbrooks/contrib.git
$ cd contrib
$ git checkout system-containers
$ cd ansible
$ vi inventory/inventory

[masters]
kube-master-test.example.com

[etcd:children]
masters

[nodes]
kube-minion-test-[1:2].example.com

$ cd scripts
$ ./deploy-cluster.sh

Substitute those hostnames above with ones that match your own test machines. Alternatively, you should be able to use the Vagrantfile in the vagrant directory of that repo, though I haven’t tested that yet.

This involves a bunch of changes to run commands like atomic install --system --name etcd {{ container_registry }}/{{ container_namespace }}/etcd:{{ container_label }} to install flannel, etcd and kubernetes master and node components if desired and specified in the inventory/group_vars/all.yml file.

In that same config file, I’ve temporarily turned off some of the newish encrypted flannel stuff, because I need to tweak the flannel container to make it work.

If you run the script as laid out above, you’ll get etcd, flannel and kube containers from my namespace in the docker hub, because the current upstream fedora containers, in the case of etcd and flannel, need a couple of changes, and in the case of kube, the upstream fedora containers (that I maintain) aren’t yet modified to run as system containers.

Speaking of which, another cool thing about system containers is that they can be run as regular docker containers. To test whether my new system containers would run as regular docker containers, I ran through the steps I mentioned in my previous post, with a different branch of ansible modded to run kube in regular docker containers, but in the all.yml conf file, I set container_registry: docker.io and container_namespace: jasonbrooks and container_label: fc25 to grab the system container versions of everything that I’ve been talking about in this post. It worked.

So, yay. I have a couple items to work through still. There’s the flannel bit I mentioned above (I think I just need to mount another dir in the flannel system container’s config.json.template). Also, I’ve been needing to restart the kubelet service again in my nodes before the kubedns pod would work, so I need to track down where in the ansible that needs to happen to make it automatic.

getting stuff done with a local openshift origin instance

A few of the projects I work with use static websites based on middleman, which you can run locally to see how your edits, or those of others, will look on the live site when they’re merged.

Each of these sites defaults to port 4567 when running locally, so if I’m running more than one of them at a time, they complain that their favored port is already taken. It’s easy enough to fire up middleman on a different port, but I thought I’d try and run a couple of these in containers, using a local instance of OpenShift Origin, a Kubernetes-based container application platform.

It’s pretty easy to get up and running with an OpenShift Origin instance using the command oc cluster up. The oc client is available for Linux, Windows and Mac OS. Since containers (pretty much) are Linux, you’ll need a Linux VM on Mac or Windows, but the oc client can use docker machine to take care of that for you. I haven’t tested that, though, because I use Linux already.

On Fedora, I followed these instructions, with the exception of installing the oc client from the Fedora repos (dnf install -y origin-clients), rather than downloading the binary from GitHub.

I wanted my origin install to persist across restarts, so I created a folder in my home directory to store persistent data, and started up my instance with:

$ sudo oc cluster up --host-data-dir=/home/jbrooks/origin-data --use-existing-config

sudo was necessary because I haven’t set up my regular user account to run docker without it — not a big deal, but some config files for logging in to my origin instance as admin ended up in my /root directory instead of my home directory, so I copied those over:

$ sudo cp -r /root/.kube ~/.
$ sudo chown -R jbrooks:jbrooks ~/.kube

I logged into the OpenShift web console using the URL and the developer:developer user name and password output by the oc cluster up command, clicked “Add to Project”, and then, under the “Languages” heading, chose “Ruby,” and then “Ruby 2.3”, because middleman is a ruby affair.

I filled in a name, pasted in the git repository URL for the ovirt middleman site, and hit “Create.”

I headed to the “Overview” page, saw that my build was running, clicked “View Log,” and saw that a familiar-looking build process was chugging along.

When the build finished, OpenShift kicked off a deployment of my image, which I could see from the deployment log linked from the overview page, was erroring out.

After some poking around, I fixed the issue by heading to the deployments section of the web console and, after first pausing the deployment, hitting the edit YAML button. I used the YAML editor to add a command right in between the image and ports sections of the configuration.

I also changed the containerPort from a default of 8080 to the middleman default of 4567. I expected this change to filter down to the service and route that were automatically created for me, but they didn’t — it wasn’t tough to edit those via the web console, however.

I added GIT_COMMITTER_NAME and GIT_COMMITTER_EMAIL environment variables to my deployment, from an “Environment” tab in the deployments area of the console. As I eventually learned, git got grumpy about running as a random UID (as is OpenShift’s security-conscious custom) rather than as a “real” user with an entry in /etc/passwd, but adding those ENV variables calmed git down.

Once I had a pod up and running, I was able to view the development site in my web browser via the URL provided in the routes section of the console.

Next, I headed to my terminal to log into my running pod with OpenShift’s oc rsh command, and fetch and check out a pending pull request on the ovirt site:

$ oc rsh ovirt-site-2-4-50eao

$ git fetch origin pull/877/head:pr-ovirt-gluster-411

$ git checkout pr-ovirt-gluster-411

The middleman development server handles live reloading, so once I checked out the new branch, it refreshed, and I could see my awaiting-merge blog post:

This works, but I’ll probably hone the process some more from here. I experimented a bit with using kompose to put together a simple docker compose-formatted manifest for my app that could either pull from an openshift-built or a built-elsewhere docker container. Like this:

version: "2"

services:  
  ovirt-site:
    image: 172.30.24.24:5000/myproject/ovirt-site
    ports:
      - "4567"
    environment:
      - GIT_COMMITTER_NAME="Jason Brooks"
      - GIT_COMMITTER_EMAIL="jbrooks@redhat.com"
    entrypoint:
      - scl
      - enable
      - rh-ruby23
      - /opt/app-root/src/run-server.sh
    labels:
      kompose.service.type: NodePort

I think that that approach would then work for a regular kube cluster or, with some tweaking, probably, docker or docker swarm as well.

running kubernetes in containers on atomic

The atomic hosts from CentOS and Fedora earn their “atomic” namesake by providing for atomic, image-based system updates via rpm-ostree, and atomic, image-based application updates via docker containers.

This “system” vs “application” division isn’t set in stone, however. There’s room for system components to move across from the somewhat rigid world of ostree commits to the freer-flowing container side.

In particular, the key atomic host components involved in orchestrating containers across multiple hosts, such as flannel, etcd and kubernetes, could run instead in containers, making life simpler for those looking to test out newer or different versions of these components, or to swap them out for alternatives.

Suraj Deshmukh wrote a post recently about running kubernetes in containers. He wanted to test kubernetes 1.3, for which Fedora packages aren’t yet available, so he turned to the upstream kubernetes-on-docker.

Suraj ran into trouble with flannel and etcd, so he ran those from installed rpms. Flannel can be tricky to run as a docker container, because docker’s own configs must be modified to use flannel, so there’s a bit of a chicken-and-egg situation.

One solution is system containers for atomic, which can be run independently from the docker daemon. Giuseppe Scrivano has built example containers for flannel and for etcd, and in this post, I’m describing how to use these system containers alongside a containerized kubernetes on an atomic host.

setting up flannel and etcd

You need a very recent version of the atomic command. I used a pair of CentOS Atomic Hosts running the “continuous” stream.

The master host needs etcd and flannel:

# atomic pull gscrivano/etcd

# atomic pull gscrivano/flannel

# atomic install --system gscrivano/etcd

With etcd running, we can use it to configure flannel:

# export MASTER_IP=YOUR-MASTER-IP

# runc exec gscrivano-etcd etcdctl set /atomic.io/network/config '{"Network":"172.17.0.0/16"}'

# atomic install --name=flannel --set FLANNELD_ETCD_ENDPOINTS=http://$MASTER_IP:2379 --system gscrivano/flannel

The worker node needs flannel as well:

# export MASTER_IP=YOUR-MASTER-IP

# atomic pull gscrivano/flannel

# atomic install --name=flannel --set ETCD_ENDPOINTS=http://$MASTER_IP:2379 --system gscrivano/flannel

On both the master and the worker, we need to make docker use flannel:

# echo "/usr/libexec/flannel/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker" | runc exec flannel bash

Also on both hosts, we need this docker tweak (because of this):

# cp /usr/lib/systemd/system/docker.service /etc/systemd/system/

# sed -i s/MountFlags=slave/MountFlags=/g /etc/systemd/system/docker.service

# systemctl daemon-reload

# systemctl restart docker

On both hosts, some context tweaks to make SELinux happy:

# mkdir -p /var/lib/kubelet/

# chcon -R -t svirt_sandbox_file_t /var/lib/kubelet/

# chcon -R -t svirt_sandbox_file_t /var/lib/docker/

setting up kube

With flannel and etcd running in system containers, and with docker configured properly, we can start up kubernetes in containers. I’ve pulled the following docker run commands from the docker-multinode scripts in the kubernetes project’s kube-deploy repository.

On the master:

# docker run -d \
--net=host \
--pid=host \
--privileged \
--restart="unless-stopped" \
--name kube_kubelet_$(date | md5sum | cut -c-5) \
-v /sys:/sys:rw \
-v /var/run:/var/run:rw \
-v /run:/run:rw \
-v /var/lib/docker:/var/lib/docker:rw \
-v /var/lib/kubelet:/var/lib/kubelet:shared \
-v /var/log/containers:/var/log/containers:rw \
gcr.io/google_containers/hyperkube-amd64:$(curl -sSL "https://storage.googleapis.com/kubernetes-release/release/stable.txt") \
/hyperkube kubelet \
--allow-privileged \
--api-servers=http://localhost:8080 \
--config=/etc/kubernetes/manifests-multi \
--cluster-dns=10.0.0.10 \
--cluster-domain=cluster.local \
--hostname-override=${MASTER_IP} \
--v=2

On the worker:

# export WORKER_IP=YOUR-WORKER-IP

# docker run -d \
--net=host \
--pid=host \
--privileged \
--restart="unless-stopped" \
--name kube_kubelet_$(date | md5sum | cut -c-5) \
-v /sys:/sys:rw \
-v /var/run:/var/run:rw \
-v /run:/run:rw \
-v /var/lib/docker:/var/lib/docker:rw \
-v /var/lib/kubelet:/var/lib/kubelet:shared \
-v /var/log/containers:/var/log/containers:rw \
gcr.io/google_containers/hyperkube-amd64:$(curl -sSL "https://storage.googleapis.com/kubernetes-release/release/stable.txt") \
/hyperkube kubelet \
--allow-privileged \
--api-servers=http://${MASTER_IP}:8080 \
--cluster-dns=10.0.0.10 \
--cluster-domain=cluster.local \
--hostname-override=${WORKER_IP} \
--v=2

# docker run -d \
--net=host \
--privileged \
--name kube_proxy_$(date | md5sum | cut -c-5) \
--restart="unless-stopped" \
gcr.io/google_containers/hyperkube-amd64:$(curl -sSL "https://storage.googleapis.com/kubernetes-release/release/stable.txt") \
/hyperkube proxy \
--master=http://${MASTER_IP}:8080 \
--v=2

get current kubectl

I usually test things out from the master node, so I’ll download the newest stable kubectl binary to there:

# curl -sSL https://storage.googleapis.com/kubernetes-release/release/$(curl -sSL "https://storage.googleapis.com/kubernetes-release/release/stable.txt")/bin/linux/amd64/kubectl > /usr/local/bin/kubectl

# chmod +x /usr/local/bin/kubectl

test it

It takes a few minutes for all the containers to get up and running. Once they are, you can start running kubernetes apps. I typically test with the guestbookgo atomicapp:

# atomic run projectatomic/guestbookgo-atomicapp

Wait a few minutes, until kubectl get pods tells you that your guestbook and redis pods are running, and then:

# kubectl describe service guestbook | grep NodePort

Visiting the NodePort returned above at either my master or worker IP (these kube scripts configure both to serve as workers) gives me this:

fedora and docker storage

While (pretty much) everyone who’s using docker is running it on Linux, and while lots of people run docker on their laptops and desktops, most aren’t running it directly on Linux desktops and laptops. Instead, most individual docker users are relying on some sort of purpose-built Linux distribution running as a virtual machine on their Mac or Windows machine.

However, if you are (like me) running Linux on your desktop, you can run docker containers right on your bare metal, with no virtualization overhead in between. Yay, Desktop Linux!

But wait. If you are (like me) running Fedora Linux on your desktop, and if you (also like me) weren’t thinking about docker and its particular storage needs when you installed Fedora on your machine, you could be in for some perplexing issues or at least crap performance, because of the way that docker storage works on Fedora.

I’ve written about the general issue elsewhere:

…the AUFS backend that started out as Docker’s default storage option, but never made its way into the mainlain Linux kernel, posed a problem for Red Hat and our upstream first, no out-of-tree bits ways.

The settled-upon solution was device mapper thin provisioning, which takes a block storage device to create a pool of space that can be used to create other block devices for Docker containers and images. The device mapper backend can be configured to use direct LVM volumes or you can let Docker create a pair of loopback mounted sparse files to serve as the block devices.

from: Friends Don’t Let Friends Run Docker on Loopback in Production

When you install Fedora on your desktop or laptop, the installer divvies your entire disk up into a small boot partition and a big LVM partition, and then divides that LVM space up into a swap volume that varies in size based on how much RAM you have installed, a root volume of 50GB, and a home volume that takes over whatever’s left.

With no room left for the pool of space that the docker device mapper storage driver needs for containers and images, the storage driver will turn instead to crappily-performing loopback mounted files. Boo!

A Fix

You can cut back the size of the home volume in Fedora without too much trouble. I like to use system-storage-manager to work with my disks:

NOTE: I guess I should add that whenever you’re mucking with your disks, you should make sure you have backups, and so on, but I have resized my own laptop partitions in just this way on more than one occasion, and I’ve tested the steps written here with a VM as I wrote this, so, yeah.

$ sudo dnf install -y system-storage-manager

Next, reboot your machine, and when you get back to the login screen, hit CTRL-ALT-F2 to get to a virtual terminal, and then log in as root. We need to do this in order to unmount the home directory before we shrink it. As root, you can use system-storage-manager to shrink down the home volume. Below I’m shrinking the home volume to 20G, because I’m testing these instructions on a VM with a 100GB drive. Substitute a value that makes sense for your rig.

# umount /home
# ssm resize -s 20G /dev/fedora/home
# reboot

If you’ve already installed and run docker, you’ll need to delete /var/lib/docker, where all of your containers and volumes live, so be prepared to rebuild those.

$ sudo systemctl stop docker
$ sudo rm -rf /var/lib/docker
$ sudo systemctl start docker

When docker starts up again, a script that comes bundled with Fedora’s distribution of docker will check to see that there’s space available in your volume group and will set up your storage correctly. If you want to grow your home volume later, it’s easy to do and doesn’t require unmounting anything. You’ll run the same ssm resize command from above, and swap in your desired volume size.

NOTE: If you’re using docker-engine from docker.com, check out these docs for setting up devicemapper driver correctly by hand.

Starting out right

If you haven’t yet installed Fedora, you can configure your system to accommodate this and other LVM storage scenarios moving forward by making your home volume smaller and modifying your “fedora” volume group from its default “Automatic” size policy to the “As large as possible” policy. This way, all your spare disk space will be ready for new volumes (such as the docker thin pool) or for growing your home or root volumes if you decide that you need the space later on. This is probably how Fedora partitioning should be configured by default, anyway, but it isn’t.

Looking ahead

Finally, there’s another option on the horizon for docker storage on Fedora, an option that doesn’t require partition changes or planning: OverlayFS. I wrote about this in the post I linked above, too, but the TLDR is that OverlayFS and SELinux don’t work together yet, although that’s set to change. Stay tuned.

testing flannel

I noticed today (maybe I’ve noticed before, but forgotten) that the version of flannel in Fedora 23 is older than what’s available in CentOS. It looks like this is because no one tested the more-recent version of flannel in Fedora’s Bodhi, a pretty awesome application for testing packages.

Why not? Maybe because it isn’t always obvious how to test a package like flannel, but I here’s how I tested it, and added karma to the package in Bodhi.

I use flannel when I cluster atomic hosts together with kubernetes. I typically use the release versions of centos or fedora atomic, but the fedora project also provides an ostree image built from fedora’s updates-testing repo, where packages await karma from testers. I prepare three atomic hosts with vagrant:

[my-laptop]$ git clone https://github.com/jasonbrooks/contrib.git

[my-laptop]$ cd contrib/ansible/vagrant

[my-laptop]$ export DISTRO_TYPE=fedora-atomic

[my-laptop]$ vagrant up --no-provision --provider=libvirt

Next, I rebase the trio of hosts to the testing tree:

[my-laptop]$ for i in {kube-node-1,kube-master,kube-node-2}; do vagrant ssh $i -c "sudo rpm-ostree rebase fedora-atomic:fedora-atomic/f23/x86_64/testing/docker-host"; done

[my-laptop]$ vagrant reload --no-provision && vagrant provision kube-master

Reloading the hosts switches them to the testing image, and runs the ansible provisioning scripts that configure the kubernetes cluster. Now to ssh to one of the boxes, confirm that I’m running an image with the newer flannel, and then run a test app on the cluster to make sure that everything is in order:

[my-laptop]$ vagrant ssh kube-master

[kube-master]$ rpm -q flannel
flannel-0.5.4-1.fc23.x86_64

[kube-master]$ sudo atomic host status
  TIMESTAMP (UTC)         VERSION   ID             OSNAME            REFSPEC                                                        
* 2016-02-03 22:47:33     23.63     65cc265ae1     fedora-atomic     fedora-atomic:fedora-atomic/f23/x86_64/testing/docker-host     
  2016-01-26 18:16:33     23.53     22f0b303da     fedora-atomic     fedora-atomic:fedora-atomic/f23/x86_64/docker-host

[kube-master]$ sudo atomic run projectatomic/guestbookgo-atomicapp

That last command pulls down an atomicapp container that deploys a guestbook example app from the kubernetes project. The app includes two redis slaves, a redis master, and a trio of frontend apps that talk to those backend pieces. The bits of the app are spread between my two kubelet nodes, with flannel handling the networking in-between. If this app is working, then I’m confident that
flannel is working.

[kube-master]$ kubectl get svc guestbook
NAME        CLUSTER_IP       EXTERNAL_IP   PORT(S)    SELECTOR        AGE
guestbook   10.254.233.237                 3000/TCP   app=guestbook   55m

[kube-master]$ exit

[my-laptop]$ vagrant ssh kube-node-1

[kube-node-1]$ curl http://10.254.233.237:3000/info
# Server
redis_version:2.8.19
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:c0359e7aa3798aa2
....

The app is working, flannel appears to be doing its job, so I marched off to bodhi to offer up my karma:

instant karma

Up and Running with oVirt, 3.2 Edition

I’ve written an updated version of this howto for oVirt 3.3 at the Red Hat Community blog.

The latest version of the open source virtualization platform, oVirt, has arrived, which means it’s time for the third edition of my “running oVirt on a single machine” blog post. I’m delighted to report that this ought to be the shortest (and least-updated, I hope) post of the three so far.

When I wrote my first “Up and Running” post last year, getting oVirt running on a single machine was more of a hack than a supported configuration. Wrangling large groups of virtualization hosts is oVirt’s reason for being. oVirt is designed to run with its manager component, its virtualization hosts, and its shared storage all running on separate pieces of hardware. That’s how you’d want it set up for production, but a project that requires a bunch of hardware just for kicking the tires is going to find its tires un-kicked.

Fortunately, this changed in August’s oVirt 3.1 release, which shipped with an All-in-One installer plugin, but, as a glance at the volume of strikethrough text and UPDATE notices in my post for that release, there were more than a few bumps in the 3.1 road.

In oVirt 3.2, the process has gotten much smoother, and should be as simple as setting up the oVirt repo, installing the right package, and running the install script. Also, there’s now a LiveCD image available that you can burn onto a USB stick, boot a suitable system from, and give oVirt a try without installing anything. The downsides of the LiveCD are its size (2.1GB) and the fact that it doesn’t persist. But, that second bit is one of its virtues, as well. The All in One setup I describe below is one that you can keep around for a while, if that’s what you’re after.

Without further ado, here’s how to get up and running with oVirt on a single machine:

HARDWARE REQUIREMENTS: You need a machine with x86-64 processors with hardware virtualization extensions. This bit is non-negotiable–the KVM hypervisor won’t work without them. Your machine should have at least 4GB of RAM. Virtualization is a RAM-hungry affair, so the more memory, the better. Keep in mind that any VMs you run will need RAM of their own.

It’s possible to run an oVirt in a virtual machine–I’ve taken to testing oVirt on oVirt itself most of the time–but your virtualization host has to be set up for nested KVM for this to work. I’ve written a bit about running oVirt in a VM here.

SOFTWARE REQUIREMENTS: oVirt is developed on Fedora, and any given oVirt release tends to track the most recent Fedora release. For oVirt 3.2, this means Fedora 18. I run oVirt on minimal Fedora configurations, installed from the DVD or the netboot images. With oVirt 3.1, a lot of people ran into trouble installing oVirt on the default LiveCD Fedora media, largely due to conflicts with NetworkManager. When I teseted 3.2 with the With 3.2, the installer script disabled NM on its own, but I had to manually enable sshd (sudo service sshd start && sudo chkconfig sshd on).

A lot of oVirt community members run the project on CentOS or Scientific Linux using packages built by Andrey Gordeev, and official packages for these “el6” distributions are in the works from the oVirt project proper, and should be available soon for oVirt 3.2. I’ve run oVirt on CentOS in the past, but right now I’m using Fedora 18 for all of my oVirt machines, in order to get access to new features like the nested KVM I mentioned earlier.

NETWORK REQUIREMENTS: Your test machine must have a host name that resolves properly on your network, whether you’re setting that up in a local dns server, or in the /etc/hosts file of any machine you expect to access your test machine from. If you take the hosts file editing route, the installer script will complain about the hostname–you can safely forge ahead.

CONFIGURE THE REPO: Somewhat confusingly, oVirt 3.1 is already in the Fedora 18 repositories, but due to some packaging issues I’m not fully up-to-speed on, that version of oVirt is missing its web admin console. In any case, we’re installing the latest, 3.2 version of oVirt, and for that we must configure our Fedora 18 system to use the oVirt project’s yum repository.

sudo yum localinstall http://ovirt.org/releases/ovirt-release-fedora.noarch.rpm

SILENCING SELINUX (OPTIONAL): I typically run my systems with SELinux in enforcing mode, but it’s a common source of oVirt issues. Right now, there’s definitely one (now fixed), and maybe two SELinux-related bugs affecting oVirt 3.2. So…

sudo setenforce 0

To make this setting persist across reboots, edit the ‘SELINUX=’ line in  /etc/selinux/config to equal ‘permissive’.

INSTALL THE ALL IN ONE PLUGIN: The package below will pull in everything we need to run oVirt Engine (the management server) as well as turn this management server into a virtualization host.

sudo yum install ovirt-engine-setup-plugin-allinone

RUN THE SETUP SCRIPT: Run the script below and answer all the questions. In almost every case, you can stick to the default answers. Since we’re doing an All in One install, I’ve tacked the relevant argument onto the command below. You can run “engine-setup -h” to check out all available arguments.

One of the questions the installer will ask deals with whether and which system firewall to configure. Fedora 18 now defaults to Firewalld rather than the more familiar iptables. In the handful of tests I’ve done with the 3.2 release code, I’ve had both success and failure configuring Firewalld through the installer. On one machine, throwing SELinux into permissive mode allowed the Firewalld config process to complete, and on another, that workaround didn’t work.

If you choose the iptables route, make sure to disable Firewalld and enable iptables before you run the install script (sudo service firewalld stop && sudo chkconfig firewalld off && sudo service iptables start && sudo chkconfig iptables on).

sudo engine-setup --config-allinone=yes

TO THE ADMIN CONSOLE: When the engine-setup script completes, visit the web admin console at the URL for your engine machine. It will be running at port 80 (unless you’ve chosen a different setting in the setup script). Choose “Administrator Portal” and log in with the credentials you entered in the engine-setup script.

From the admin portal, click the “Storage” tab and highlight the iso domain you created during the setup-script. In the pane that appears below, choose the “Data Center” tab, click “Attach,” check the box next to your local data center, and hit “OK.” Once the iso domain is finished attaching, click “Activate” to activate it.

Now you have an oVirt management server that’s configured to double as a virtualization host. You have a local data domain (for storing your VM’s virtual disk images) and an NFS iso domain (for storing iso images from which to install OSes on your VMs).

To get iso images into your iso domain, you can copy an image onto your ovirt-engine machine, and from the command line, run, “engine-iso-uploader upload -i iso NAME_OF_YOUR_ISO.iso” to load the image. Otherwise (and this is how I do it), you can mount the iso NFS share from wherever you like. Your images don’t go in the root of the NFS share, but in a nested set of folders that oVirt creates automatically that looks like: “/nfsmountpoint/BIG_OLE_UUID/images/11111111-1111-1111-1111-111111111111/NAME_OF_YOUR_ISO.iso. You can just drop them in there, and after a few seconds, they should register in your iso domain.

Once you’re up and running, you can begin installing VMs. I made the “creating VMs” screencast below for oVirt 3.1, but the process hasn’t changed significantly for 3.2:

[youtube:http://www.youtube.com/watch?v=C4gayV6dYK4&HTML5=1%5D

Gluster User Story: Fedora Hosted

The Fedora Project’s infrastructure team needed a way to ensure the reliability of its Fedora Hosted service, while making the most of their available hardware resources. The team tapped GlusterFS replicated volumes to convert what had been a two-node, active/passive, eventually consistent hosting configuration into a well-synchronized setup in which both nodes could take on user load.

Hosting Fedora Hosted

The Fedora Infrastructure team develops, deploys, and maintains various services for the Fedora Project. One of these services, Fedora Hosted, provides open source projects with a place to host their code and collaborate online.

I talked to the team’s Infrastructure Lead, Kevin Fenzi, about how they’re using Gluster to ensure availability of these services while making the most of their server resources.

Fedora Hosted is served from a pair of virtual instances hosted at serverbeach.com, which donates these resources to the project. The instances run Red Hat Enterprise Linux 6 and maintain a replicated GlusterFS 3.3.0 volume to keep the 50GB of project data stored at Fedora Hosted in sync. The nodes use Gluster’s NFS mount support, which the team found to deliver better performance with the many small files that Fedora Hosted serves.

“Both servers are in DNS, so it’s round robin which one you hit for any given connection. Since the data on the backend is replicated, both of them are up to date at any given time,” Kevin explained. “This way, not only can we handle more load cpu-wise, but if we wish to reboot one node for an update or the like, we simply adjust DNS and there is no outage seen by our projects.”

The Road to Gluster

An earlier incarnation of Fedora Hosted was also run on a pair of virtual instances, one actively serving users and the other a standby kept in sync with an hourly rsync job. If the primary node failed, the standby instance could be brought up in short order, but the hourly sync window meant that the service could suffer an hour or two of data loss.

The Fedora Infrastructure team managed to close this sync window by shifting to a new configuration based on the DRBD project. While this solution dealt with the problem of data loss following an outage, the configuration left one node mostly idle.

The team’s first foray into a GlusterFS-backed configuration for Fedora Hosted turned up a couple of issues with the then-current GlusterFS version 3.2, which the Gluster project addressed in their 3.3 release.

“The Gluster folks were very responsive to our issues and were working on the patch very soon after we requested it,” Kevin explained. “Additionally, 3.3 performance seemed to be much better than 3.2 for our use cases.”

Looking ahead, Kevin and the other members of the Infrastructure team have their eyes set on continued performance enhancements. While the Gluster 3.3-backed Fedora Hosted service has handled its community collaboration load quite well, Kevin pointed out that “we could always want better performance.”

oVirt 3.1, Glusterized

One of the cooler new features in oVirt 3.1 is the platform’s support for creating and managing Gluster volumes. oVirt’s web admin console now includes a graphical tool for configuring these volumes, and vdsm, the service for responsible for controlling oVirt’s virtualization nodes, has a new sibling, vdsm-gluster, for handling the back end work.

Gluster and oVirt make a good team — the scale out, open source storage project provides a nice way of weaving the local storage on individual compute nodes into shared storage resources.

To demonstrate the basics of using oVirt’s new Gluster functionality, I’m going to take the all-in-one engine/node oVirt rig that I stepped through recently and convert it from an all-on-one node with local storage, to a multi-node ready configuration with shared storage provided by Gluster volumes that tap the local storage available on each of the nodes. (Thanks to Robert Middleswarth, whose blog posts on oVirt and Gluster I relied on while learning about the combo.)

The all-in-one installer leaves you with a single machine that hosts both the oVirt management server, aka ovirt-engine, and a virtualization node. For storage, the all-in-one setup uses a local directory for the data domain, and an NFS share on the single machine to host an iso domain, where OS install images are stored.

We’ll start the all-in-one to multi-node conversion by putting our local virtualization host, local_host, into maintenance mode by clicking the Hosts tab in the web admin console, clicking the local_host entry, and choosing “Maintenance” from the Hosts navigation bar.

Once local_host is in maintenance mode, we click edit, change to the Default data center and host cluster from the drop down menus in the dialog box, and then hit OK to save the change.

This is assuming that you stuck with NFS as the default storage type while running through the engine-setup script. If not, head over to the Data Centers tab and edit the Default data center to set “NFS” as its type. Next, head to the Clusters tab, edit your Default cluster, fill the check box next to “Enable Gluster Service,” and hit OK to save your changes. Then, go back to the Hosts tab, highlight your host, and click Activate to bring it back from maintenance mode.

Now head to a terminal window on your engine machine. Fedora 17, the OS I’m using for this walkthrough, includes version 3.2 of Gluster. The oVirt/Gluster integration requires Gluster 3.3, so we need to configure a separate repository to get the newer packages:

# cd /etc/yum.repos.d/
# wget http://repos.fedorapeople.org/repos/kkeithle/glusterfs/fedora-glusterfs.repo

Next, install the vdsm-gluster package, restart the vdsm service, and start up the gluster service:

# yum install vdsm-gluster
# service vdsmd restart
# service glusterd start

The all-in-one installer configures an NFS share to host oVirt’s iso domain. We’re going to be exposing our Gluster volume via NFS, and since the kernel NFS server and Gluster’s NFS server don’t play well nicely together, we have to disable the former server.

# systemctl stop nfs-server.service && systemctl disable nfs-server.service

Through much trial and error, I found that it was also necessary to restart the wdmd service:

# systemctl restart wdmd.service

In the move from v3.0 to v3.1, oVirt dropped its NFSv3-only limitation, but that requirement remains for Gluster, so we have to edit /etc/nfsmount.conf and ensure that Defaultvers=3, Nfsvers=3, and Defaultproto=tcp.

Next, edit /etc/sysconfig/iptables to add the firewall rules that Gluster requires. You can paste the rules in just before the reject lines in your config.

# glusterfs
-A INPUT -p tcp -m multiport --dport 24007:24047 -j ACCEPT
-A INPUT -p tcp --dport 111 -j ACCEPT
-A INPUT -p udp --dport 111 -j ACCEPT
-A INPUT -p tcp -m multiport --dport 38465:38467 -j ACCEPT

Then restart iptables:

# service iptables restart

Next, decide where you want to store your gluster volumes — I store mine under /data — and create this directory if need be:

# mkdir /data

Now, head back to the oVirt web admin console, visit the Volumes tab, and click Create Volume. Give your new volume a name, and choose a volume type from the drop down menu. For our first volume, let’s choose Distribute, and then click the Add Bricks button. Add a single brick to the new volume by typing the path you desire into the the Brick Directory field, clicking Add, and then OK to save the changes.

Make sure that the box next to NFS is checked under Access Protocols, and then click OK. You should see your new volume listed — highlight it and click Start to start it up. Follow the same steps to create a second volume, which we’ll use for a new ISO domain.

For now, the Gluster volume manager neglects to set brick directory permissions correctly, so after adding bricks on a machine, you have to return to the terminal and run chown -R 36.36 /data (assuming /data is where you are storing your volume bricks) to enable oVirt to write to the volumes.

Once you’ve set your permissions, return to the Storage tab of the web admin console to add data and iso domains at the volumes we’ve created. Click New Domain, choose Default data center from the data center drop down, and Data / NFS from the storage type drop down. Fill the export path field with your engine’s host name and the volume name from the Gluster volume you created for the data domain. For instance: “demo1.localdomain:/data”

Wait for data domain to become active, and repeat the above process for the iso domain. For more information on setting up storage domains in oVirt 3.1, see the quick start guide.

Once the iso domain comes up, BAM, you’re Glusterized. Now, compared to the default all-in-one install, things aren’t too different yet — you have one machine with everything packed into it. The difference is that your oVirt rig is ready to take on new nodes, which will be able to access the NFS-exposed data and iso domains, as well as contribute some of their own local storage into the pool.

To check this out, you’ll need a second test machine, with Fedora 17 installed (though you can recreate all of this on CentOS or another Enterprise Linux starting with the packages here). Take your F17 host (I start with a minimal install), install the oVirt release package, download the same fedora-glusterfs.repo we used above, and make sure your new host is accessible on the network from your engine machine, and vice versa. Also, the bug preventing F17 machines running a 3.5 or higher kernel from attaching to NFS domains isn’t fixed yet, so make sure you’re running a 3.3 or 3.4 version of the kernel.

Head over to the Hosts tab on your web admin console, click New, supply the requested information, and click OK. Your engine will reach out to your new F17 machine, and whip it into a new virtualization host. (For more info on adding hosts, again, see the quick start guide.)

Your new host will require most of the same Glusterizing setup steps that you applied to your engine server: make sure that vdsm-gluster is installed, edit /etc/nfsmount.conf, add the gluster-specific iptables rules and restart iptables, create and chown 36.36 your data directory.

The new host should see your Gluster-backed storage domains, and you should be able to run VMs on both hosts and migrate them back and forth. To take the next step and press local storage on your new node into service, the steps are pretty similar to those we used to create our first Gluster volumes.

First, though, we have to run the command “gluster peer probe NEW_HOST_HOSTNAME” from the engine server to get the engine and it’s new buddy hooked up Glusterwise (this another of the wrinkles I hope to see ironed out soon, taken care automatically in the background).

We can create a new Gluster volume, data1, of the type Replicate. This volume type requires at least two bricks, and we’ll create one in the /data directory of our engine, and one in the /data directory of our node. This works just the same as with the first Gluster volume we set up, just make sure that when adding bricks, you select the correct server in the drop down menu:

Just as before, we have to return to the command line to chown -R 36.36 /data on both of our machines to set the permissions correctly, and start the volumes we’ve created.

On my test setup, I created a second data domain, named data1, stored on the replicated Gluster domain, with the storage path set to localhost:/data1, on the rationale that VM images stored on the data1 domain would stay in sync across the pair of hosts, enabling either of my hosts to tap local storage for running a particular VM image. But I’m a newcomer to Gluster, so consult the documentation for more clueful Gluster guidance.

Too Fast, Too Slow

Yesterday I removed Fedora 17 from the server I use for oVirt testing, mainly, because I’ve been experiencing random reboots on the server, and I haven’t been able to figure out why. I’m pretty sure I wasn’t having these issues on Fedora 16, but I can’t go back to that release because the official packages for oVirt are built only for F17. There are, however, oVirt packages built for Enterprise Linux (aka RHEL and its children), and I know that some in the oVirt community have been running with these packages with success.

So, I figured I’d install CentOS 6 on my machine and either escape my random reboots or, if the reboots continued, learn that there’s probably something wrong with my hardware. Plus, I’d escape a second bug I’ve been experiencing with Fedora 17, the one in which a recent rebase to the Linux 3.5 kernel (F17 shipped originally with a 3.3 kernel) seems to have broken oVirt’s ability to access NFS shares, thereby breaking oVirt.

Installing oVirt 3.1 on CentOS 6 went very smoothly — the steps involved were pretty much the same as those for Fedora. Before I knew it, I was back up and running with a CentOS-based oVirt 3.1 rig just like my F17 one, complete with my F17 test server template and my F17 VMs for my in-progress gluster/ovirt integration writeup, all repatriated from my oVirt export domain.

However… all is not well.

My Fedora 17 VMs aren’t running normally on my new CentOS 6 host, and what I’m seeing reminds me of a bug I encountered several weeks ago when I first upgraded from oVirt 3.0 to the oVirt 3.1 beta. The solution came in the form of a bugfix from the qemu project upstream — that’s a real benefit of running a leading edge distro like Fedora — when issues are fixed upstream, you don’t have to wait forever for them to float along to you.

Also, the closer you are to upstream, the faster you get access to new features. Not long after updating my qemu to address the F16/F17 VM booting issue, I took to running qemu packages even closer to upstream, from the Fedora Virtualization Preview repository. The oVirt 3.1 management engine supports live snapshots, but requires at least qemu 1.1, which is slated for Fedora 18.

Of course, the downside of tracking the leading edge is that with frequent changes come frequent opportunities for breakage. The changes that don’t directly address your pain points are pure downside, like the NFS-disabling kernel rebase I mentioned earlier. Too fast versus too slow.

So what now?

I wasn’t experiencing these random reboots on my other F17 system — my Thinkpad X220, which I’ve pressed into service as a second oVirt node. I have this F17-based node hooked up to my el6 oVirt engine, and if I set my Fedora VMs to launch only on this node, they run just fine. This machine has only 8GB of RAM, though, and that limits how many VMs I can run on it. Also, since my F17 and el6 nodes are running different versions of qemu, live migration between them doesn’t work.

  • I could shift my in-progress ovirt/gluster testing to el6, VMs of which run just fine with the older qemu, but I’d prefer to keep testing with Fedora, and the newest code.
  • I could, instead of hitting the brakes and running el6 on my test server, hit the gas and throw F18 on there. Maybe that’d solve my random reboot issue, though I’m not sure if my disabled NFS travails would follow me forward.
  • I could figure out how to rebuild the new qemu packages on el6. I’ve started down this path already, but rpmbuild is voicing some complaints that seem related to systemd, which F17 uses and el6 does not.
  • I could find out that my random reboot problems weren’t the fault of F17 after all, which would send me poring over my hardware and possibly returning to F17.

For now, I’m going to play some more with updating my qemu on el6, while squeezing my F17 VMs into my smaller F17-based node to get this ovirt/gluster howto finished.

Then maybe I’ll take a long walk on the beach and meditate on the merits of too slow versus too fast in Linux distros, and ponder whether the Giants will sweep the Astros tonight.

Update: I was able to rebuild Fedora qemu 1.1 packages for CentOS. I commented out some systemd-dependent stuff from the spec file. I had to rebuild a couple of other packages, too, which I found in Fedora’s buildsystem. Now, my Fedora 17 VMs run well on my CentOS 6 oVirt host (which hasn’t randomly rebooted yet), and I can migrate VMs between it and my F17-based node.

And the Giants won, too.