Yesterday I removed Fedora 17 from the server I use for oVirt testing, mainly, because I’ve been experiencing random reboots on the server, and I haven’t been able to figure out why. I’m pretty sure I wasn’t having these issues on Fedora 16, but I can’t go back to that release because the official packages for oVirt are built only for F17. There are, however, oVirt packages built for Enterprise Linux (aka RHEL and its children), and I know that some in the oVirt community have been running with these packages with success.
So, I figured I’d install CentOS 6 on my machine and either escape my random reboots or, if the reboots continued, learn that there’s probably something wrong with my hardware. Plus, I’d escape a second bug I’ve been experiencing with Fedora 17, the one in which a recent rebase to the Linux 3.5 kernel (F17 shipped originally with a 3.3 kernel) seems to have broken oVirt’s ability to access NFS shares, thereby breaking oVirt.
Installing oVirt 3.1 on CentOS 6 went very smoothly — the steps involved were pretty much the same as those for Fedora. Before I knew it, I was back up and running with a CentOS-based oVirt 3.1 rig just like my F17 one, complete with my F17 test server template and my F17 VMs for my in-progress gluster/ovirt integration writeup, all repatriated from my oVirt export domain.
However… all is not well.
My Fedora 17 VMs aren’t running normally on my new CentOS 6 host, and what I’m seeing reminds me of a bug I encountered several weeks ago when I first upgraded from oVirt 3.0 to the oVirt 3.1 beta. The solution came in the form of a bugfix from the qemu project upstream — that’s a real benefit of running a leading edge distro like Fedora — when issues are fixed upstream, you don’t have to wait forever for them to float along to you.
Also, the closer you are to upstream, the faster you get access to new features. Not long after updating my qemu to address the F16/F17 VM booting issue, I took to running qemu packages even closer to upstream, from the Fedora Virtualization Preview repository. The oVirt 3.1 management engine supports live snapshots, but requires at least qemu 1.1, which is slated for Fedora 18.
Of course, the downside of tracking the leading edge is that with frequent changes come frequent opportunities for breakage. The changes that don’t directly address your pain points are pure downside, like the NFS-disabling kernel rebase I mentioned earlier. Too fast versus too slow.
So what now?
I wasn’t experiencing these random reboots on my other F17 system — my Thinkpad X220, which I’ve pressed into service as a second oVirt node. I have this F17-based node hooked up to my el6 oVirt engine, and if I set my Fedora VMs to launch only on this node, they run just fine. This machine has only 8GB of RAM, though, and that limits how many VMs I can run on it. Also, since my F17 and el6 nodes are running different versions of qemu, live migration between them doesn’t work.
- I could shift my in-progress ovirt/gluster testing to el6, VMs of which run just fine with the older qemu, but I’d prefer to keep testing with Fedora, and the newest code.
- I could, instead of hitting the brakes and running el6 on my test server, hit the gas and throw F18 on there. Maybe that’d solve my random reboot issue, though I’m not sure if my disabled NFS travails would follow me forward.
- I could figure out how to rebuild the new qemu packages on el6. I’ve started down this path already, but rpmbuild is voicing some complaints that seem related to systemd, which F17 uses and el6 does not.
- I could find out that my random reboot problems weren’t the fault of F17 after all, which would send me poring over my hardware and possibly returning to F17.
For now, I’m going to play some more with updating my qemu on el6, while squeezing my F17 VMs into my smaller F17-based node to get this ovirt/gluster howto finished.
Then maybe I’ll take a long walk on the beach and meditate on the merits of too slow versus too fast in Linux distros, and ponder whether the Giants will sweep the Astros tonight.
Update: I was able to rebuild Fedora qemu 1.1 packages for CentOS. I commented out some systemd-dependent stuff from the spec file. I had to rebuild a couple of other packages, too, which I found in Fedora’s buildsystem. Now, my Fedora 17 VMs run well on my CentOS 6 oVirt host (which hasn’t randomly rebooted yet), and I can migrate VMs between it and my F17-based node.
And the Giants won, too.
11 responses to “Too Fast, Too Slow”
I’m having the exact same random reboot problem on a Dell Vostro 460 — once or twice per day it just reboots, without any traces in the logs or anything. I haven’t seen anything like it in close to 20 years of computing (not even under DOS or Window 3.1 or NT), and 16 years of Linuxing (half a dozen of different distros). This is driving me mad, and makes me lose lotsa data (using Fedora 17 as a desktop machine).
I have Windows 7 installed in parallel on the same machine, and at the beginning I thought this was a hardware problem … so have recently been trying hard to crash and reboot under the other OS in the same way — to no avail. Windows is rock solid on this machine.
“Beefy Miracle” indeed.
Hmm, what processor do you have in your Vostro 460?
Jason, I have a Core i5-2400 @ 3.1 GHz. I’m using the built-in Intel graphics btw. Do you see any similarities hardware wise?
I have to say the problem didn’t start until 2 or 3 months ago. Before that everything was stable. It was a bit hot over here in Germany, but ambient temp. in the room was never above 31°C or so, if ever. (At any rate, the machine would reboot at much lower temperatures.) I bought this machine in Nov. last year. I still somehow think that it is a hardware problem.
Just drop me a line if you wish, cheers and good luck.
Further testing has shown that the problem goes away as soon as I reduce the number of CPU cores to one (using the BIOS setup). As soon as I returned to two cores, the crashing resumes …
Interesting! I haven’t been testing this lately, as I moved that server to CentOS. I have had some random reboots on my F17 notebook, w/ 2 cores, but few enough that I haven’t chased it down. My server has four cores — if this is related to multicore, then it might make sense for it to happen more frequently on the machine w/ more cores…
I also want to use CentOS 6, Can you please tell me which version did you used ?
and which yum repository did you choose for install ovirt-engine on CentOS. Thanks for your very useful Information.
The info you need is here: http://wiki.dreyou.org/dokuwiki/doku.php/ovirt_rpm_start31
You also need to enable the EPEL repo: http://fedoraproject.org/wiki/EPEL#How_can_I_use_these_extra_packages.3F
Thanks, I am trying according to your documents.
Do you think Ovirt-3.1 s ready for use in production level managed virtulization environment ?
That really depends on how much self-support you’re ready for — I’d spend some time testing it in your desired setting, and decide from there.
Thanks , Successfully Installed my test environment with 3 Dell R710 Server. Its working fine with dozens of vm. need to do some more RND before going to production. Do you have any guides to migrate vms from existing KVM server.
You can look into http://libguestfs.org/virt-v2v/