Recent Adventures in oVirt and Gluster

At the end of last week, I spied an exciting tweet about oVirt:


Not long after I started using oVirt and Gluster together, the projects started talking about a way to improve Gluster performance by enabling virtualization hosts to access Gluster volumes directly, using Gluster’s libgfapi, rather than through a FUSE-mounted location on the virtualization host. There was a little bit of fit and finish work to be done, and then we’d all be basking in the glow of ~30% better Gluster storage performance.

That was about four years ago. There ended up being kind of a lot of different little things that needed fixing to make this feature work in oVirt. You can follow many of the twists and turns in bugzilla.

All along, I was eagerly awaiting the feature both as a cool new oVirt+Gluster development and as a welcome option for speeding up my own lab. Disk has always been the weakest part of my hardware setup. My servers each have a single pair of 1TB drives in mirrored RAID, shared between Gluster and the OS, and my VM’s virtual drives had been stored in triplicate in replica 3 Gluster volumes. More recently, with the advent of Gluster arbiter bricks, I’ve been able to get the split-brain protection of replica 3 volumes with only two copies of the data, and that sped things up a bit, but did nothing to dampen my appetite for libgfapi.

Since I need my oVirt setup to get things done, I usually don’t test RC versions of new oVirt components there, but I couldn’t wait any longer and took the plunge. I installed the RC2 updates on each of my virt hosts, and on my engine, I installed a slightly newer versionof the code, from the experimental repo, which contained a few last bits that hadn’t made RC2. Then, on my engine, I ran:

# engine-config -s LibgfApiSupported=true
# systemctl restart ovirt-engine

Any VMs that were already running before the upgrade continued running without libgfapi, and if I migrated them to another host, they’d turn up on that host still using the old access method. When I restarted my VMs, they returned using libgfapi. I could tell which was which by grepping through the qemu processes on a particular VM host.

# ps ax | grep qemu | grep 'file=gluster\|file=/rhev'

-drive file=/rhev/data-center/00000001-0001-0001-0001-00000000025e/616be2b6-71db-4f54-befd-be6a444775d7/images/3f7877e7-e532-44a0-8735-c7b2ca06de3b/48ee34fc-ae12-494c-892f-4229fe1fef9d

-drive file=gluster://

The qemu command lines are super long, so I cut them down just to include the line specifying the virtual drives. In the first example, the drive is being accessed through a FUSE mount, and the second, there’s a direct connection to the Gluster volume.

So, how was performance?

I tried a few different tests, starting with runningddon one of my VMs:

# dd bs=1M count=1024 if=/dev/zero of=test conv=fdatasync && rm test

I ran this a bunch of times on a VM in both storage configurations and the libgfapi configuration came out about 44% faster on average.

For a more “real world” test, I figured I’d measure the time it takes to complete a common task of mine: configuring a test Kubernetes cluster from three Fedora Atomic Host VMs using the upstream ansible scripts. I recorded and averaged the time it took to complete this task across multiple runs on VMs running in each storage configuration, and found that libgfapi was 11% faster.

zram madness

Not too bad, but like I said earlier, my oVirt setup can use all the storage speed help it can get. My servers don’t have a lot of disk but they do have quite a bit of RAM, 256GB apiece, so I’ve long wondered how I could use that RAM to wring more speed out of my setup. For a few months I’ve been experimenting with using Gluster volumes backed by RAM-disks, using zram devices.

This actually works pretty well, and I was seeing speeds similar to what I get running on the SSD in my laptop. Of course, RAM-disks mean losing everything on the disk in the event of a reboot (expected or otherwise), but using replica 3 Gluster volumes, I could reboot one host at a time without losing everything else. Upon bringing back the rebooted host, I’d run a little script to recreate the zram device and the mount points, and then follow the Gluster instructions for replacing a failed brick.

# cat
ZRAMSIZE=$((1024 * 1024 * 1024 * 50))
modprobe zram
echo ${ZRAMSIZE} > /sys/class/block/zram0/disksize
mkfs -t xfs /dev/zram0
mkdir -p /gluster-bricks/fast
mount /dev/zram0 /gluster-bricks/fast
mkdir /gluster-bricks/fast/brick

However, if all of my machines went down at once, due to a power failure in the lab or something like that, replication wouldn’t help me. I wondered if I could still get a significant boost out of a mixture of zram and regular disk backed volumes, with each of my servers hosting one zram-backed brick, one regular disk-backed brick, and one regular disk-backed arbiter brick, all combined into one distributed-replicated Gluster volume.


I ran my same ansible-kubernetes setup tests with the VM drives hosted from my “fast” Gluster domain, and the tests run 32% faster than with the my regular disk-backed (and now libgfapi-enabled) “data” storage domain. Pretty nice, and, in this sort of setup, a power loss would mean that each of four replica groups would be missing one brick, with a remaining data brick and an arbiter brick still around to maintain the data and allow me to repair things.

I want to experiment a bit further with automated tiering in Gluster, where I’d connect a RAM-disk boosted volume like this to the volume for my main data domain, and frequently-accessed files would automatically migrate to the faster storage. As it is now, my fast domain has to be relatively small, so I have to budget my use of it.

oVirt 3.1, Glusterized

One of the cooler new features in oVirt 3.1 is the platform’s support for creating and managing Gluster volumes. oVirt’s web admin console now includes a graphical tool for configuring these volumes, and vdsm, the service for responsible for controlling oVirt’s virtualization nodes, has a new sibling, vdsm-gluster, for handling the back end work.

Gluster and oVirt make a good team — the scale out, open source storage project provides a nice way of weaving the local storage on individual compute nodes into shared storage resources.

To demonstrate the basics of using oVirt’s new Gluster functionality, I’m going to take the all-in-one engine/node oVirt rig that I stepped through recently and convert it from an all-on-one node with local storage, to a multi-node ready configuration with shared storage provided by Gluster volumes that tap the local storage available on each of the nodes. (Thanks to Robert Middleswarth, whose blog posts on oVirt and Gluster I relied on while learning about the combo.)

The all-in-one installer leaves you with a single machine that hosts both the oVirt management server, aka ovirt-engine, and a virtualization node. For storage, the all-in-one setup uses a local directory for the data domain, and an NFS share on the single machine to host an iso domain, where OS install images are stored.

We’ll start the all-in-one to multi-node conversion by putting our local virtualization host, local_host, into maintenance mode by clicking the Hosts tab in the web admin console, clicking the local_host entry, and choosing “Maintenance” from the Hosts navigation bar.

Once local_host is in maintenance mode, we click edit, change to the Default data center and host cluster from the drop down menus in the dialog box, and then hit OK to save the change.

This is assuming that you stuck with NFS as the default storage type while running through the engine-setup script. If not, head over to the Data Centers tab and edit the Default data center to set “NFS” as its type. Next, head to the Clusters tab, edit your Default cluster, fill the check box next to “Enable Gluster Service,” and hit OK to save your changes. Then, go back to the Hosts tab, highlight your host, and click Activate to bring it back from maintenance mode.

Now head to a terminal window on your engine machine. Fedora 17, the OS I’m using for this walkthrough, includes version 3.2 of Gluster. The oVirt/Gluster integration requires Gluster 3.3, so we need to configure a separate repository to get the newer packages:

# cd /etc/yum.repos.d/
# wget

Next, install the vdsm-gluster package, restart the vdsm service, and start up the gluster service:

# yum install vdsm-gluster
# service vdsmd restart
# service glusterd start

The all-in-one installer configures an NFS share to host oVirt’s iso domain. We’re going to be exposing our Gluster volume via NFS, and since the kernel NFS server and Gluster’s NFS server don’t play well nicely together, we have to disable the former server.

# systemctl stop nfs-server.service && systemctl disable nfs-server.service

Through much trial and error, I found that it was also necessary to restart the wdmd service:

# systemctl restart wdmd.service

In the move from v3.0 to v3.1, oVirt dropped its NFSv3-only limitation, but that requirement remains for Gluster, so we have to edit /etc/nfsmount.conf and ensure that Defaultvers=3, Nfsvers=3, and Defaultproto=tcp.

Next, edit /etc/sysconfig/iptables to add the firewall rules that Gluster requires. You can paste the rules in just before the reject lines in your config.

# glusterfs
-A INPUT -p tcp -m multiport --dport 24007:24047 -j ACCEPT
-A INPUT -p tcp --dport 111 -j ACCEPT
-A INPUT -p udp --dport 111 -j ACCEPT
-A INPUT -p tcp -m multiport --dport 38465:38467 -j ACCEPT

Then restart iptables:

# service iptables restart

Next, decide where you want to store your gluster volumes — I store mine under /data — and create this directory if need be:

# mkdir /data

Now, head back to the oVirt web admin console, visit the Volumes tab, and click Create Volume. Give your new volume a name, and choose a volume type from the drop down menu. For our first volume, let’s choose Distribute, and then click the Add Bricks button. Add a single brick to the new volume by typing the path you desire into the the Brick Directory field, clicking Add, and then OK to save the changes.

Make sure that the box next to NFS is checked under Access Protocols, and then click OK. You should see your new volume listed — highlight it and click Start to start it up. Follow the same steps to create a second volume, which we’ll use for a new ISO domain.

For now, the Gluster volume manager neglects to set brick directory permissions correctly, so after adding bricks on a machine, you have to return to the terminal and run chown -R 36.36 /data (assuming /data is where you are storing your volume bricks) to enable oVirt to write to the volumes.

Once you’ve set your permissions, return to the Storage tab of the web admin console to add data and iso domains at the volumes we’ve created. Click New Domain, choose Default data center from the data center drop down, and Data / NFS from the storage type drop down. Fill the export path field with your engine’s host name and the volume name from the Gluster volume you created for the data domain. For instance: “demo1.localdomain:/data”

Wait for data domain to become active, and repeat the above process for the iso domain. For more information on setting up storage domains in oVirt 3.1, see the quick start guide.

Once the iso domain comes up, BAM, you’re Glusterized. Now, compared to the default all-in-one install, things aren’t too different yet — you have one machine with everything packed into it. The difference is that your oVirt rig is ready to take on new nodes, which will be able to access the NFS-exposed data and iso domains, as well as contribute some of their own local storage into the pool.

To check this out, you’ll need a second test machine, with Fedora 17 installed (though you can recreate all of this on CentOS or another Enterprise Linux starting with the packages here). Take your F17 host (I start with a minimal install), install the oVirt release package, download the same fedora-glusterfs.repo we used above, and make sure your new host is accessible on the network from your engine machine, and vice versa. Also, the bug preventing F17 machines running a 3.5 or higher kernel from attaching to NFS domains isn’t fixed yet, so make sure you’re running a 3.3 or 3.4 version of the kernel.

Head over to the Hosts tab on your web admin console, click New, supply the requested information, and click OK. Your engine will reach out to your new F17 machine, and whip it into a new virtualization host. (For more info on adding hosts, again, see the quick start guide.)

Your new host will require most of the same Glusterizing setup steps that you applied to your engine server: make sure that vdsm-gluster is installed, edit /etc/nfsmount.conf, add the gluster-specific iptables rules and restart iptables, create and chown 36.36 your data directory.

The new host should see your Gluster-backed storage domains, and you should be able to run VMs on both hosts and migrate them back and forth. To take the next step and press local storage on your new node into service, the steps are pretty similar to those we used to create our first Gluster volumes.

First, though, we have to run the command “gluster peer probe NEW_HOST_HOSTNAME” from the engine server to get the engine and it’s new buddy hooked up Glusterwise (this another of the wrinkles I hope to see ironed out soon, taken care automatically in the background).

We can create a new Gluster volume, data1, of the type Replicate. This volume type requires at least two bricks, and we’ll create one in the /data directory of our engine, and one in the /data directory of our node. This works just the same as with the first Gluster volume we set up, just make sure that when adding bricks, you select the correct server in the drop down menu:

Just as before, we have to return to the command line to chown -R 36.36 /data on both of our machines to set the permissions correctly, and start the volumes we’ve created.

On my test setup, I created a second data domain, named data1, stored on the replicated Gluster domain, with the storage path set to localhost:/data1, on the rationale that VM images stored on the data1 domain would stay in sync across the pair of hosts, enabling either of my hosts to tap local storage for running a particular VM image. But I’m a newcomer to Gluster, so consult the documentation for more clueful Gluster guidance.