My first foray into RAID

I have a Mythbox that I built well over a year ago that I absolutely love. And as long as it's working correctly, my wife loves it to. One of the really cool things about it is that I have all of my DVDs available on demand, along with all the recorded TV shows (think AppleTV and Tivo having a Linux-baby).

Having movies immediately playable from a menu is teh awesomes. However, this requires an enormous amount of storage space. I don't have enough space in the nice Mythbox case that's designed to look like an entertainment console rather than a beige box computer. So I turned my existing Linux server (which handles NAT and firewalling for my home network) into a storage server.

For almost a year, I had three 500GB drives merged together via LVM to make a 1.4TB volume to contain all my movies. This has been plenty of room, and thanks to LVM, all I need to do later is add more hard drives and extend the volume.

However, this has no failure tolerance. One drive dies, and the entire collection is lost. Yikes.

Enter RAID 5

Long story short, I had a spare 500GB drive laying around for a total of 4 identically-sized drives. For those that don't know much about RAID and RAID 5, this particular setup uses 4 drives (of identical size) and merges them together. It essentially uses one of the drives as a backup, so your total space is reduced by one drive. However, if any of the drives die, all the data is preserved, and adding a new drive (once the RAID service rebuilds the array) will restore all the original functionality and content as if the error never happened.

For Christmas, I got a new server case (an Antec 300) to replace the mini-ATX case I was using. I also got a new motherboard that (supposedly) had RAID 1/0/5/10. After a morning of hardware swapping and some minor tweaks, I had my server operating in the new case and hardware. After backing up my movies to external hard drives and spare room here and there, I was ready to begin the conversion to RAID 5.

Read more: RAID 5 Explained

Problems

Motherboard RAID controller isn't hardware RAID

One lesson I learned is that on-board motherboard RAID controllers are not true "hardware" RAID controllers. Sure, they're hardware-ish, but ultimately they require drivers in the operating system to interpret the RAID array. This effectively is just a software RAID setup. Software RAID is much more CPU-intensive than hardware RAID, and so it isn't as desirable.

This particular motherboard is great, and has loads of features if it was used in a Windows gaming system. But it's not; it's a Linux server. It's still great, but the RAID controller doesn't have a Linux driver. This left me to use software RAID solution.

Had I known this before I started, I would have spent the money on a true RAID card rather than the motherboard. But since it was an Internet order, and I already had in installed, and since my movie collection would be largely read-only (and the CPU intensity would be minimized), I decided I could live with software RAID.

Now I'm just mad that the fact that the RAID is Windows-only isn't mentioned anywhere except the owner's manual (that you only get once you've bought it).

/dev/hda is AWOL

I also ran into a weird problem when I booted up the existing server with the new motherboard -- a kernel panic caused by a missing hard drive (which had the OS on it).

This took a long time to straighten out, but what I eventually found was that, despite the fact that the IDE channel the primary hard drive was on was still the primary channel, and that nothing really had changed, the device node had changed from /dev/hda to /dev/hdc. In Windows Land, this is the same as your C: drive suddenly becoming your E: drive.

I changed the GRUB settings (which tells the computer what to boot and where the OS is) to point to the hdc drive and all was well. The fix was easy, but diagnosing it freaked me out for a good while.

Success!

By noon, I had a working RAID 5 array for my movies. By this point, I had recompiled my kernel to support software RAID 5, physically rebuilt my server, debugged all the kinks out of the new motherboard, and got everything back up and running.

I built out a new JFS file system on the RAID environment. I rebooted a couple of times to make sure everything would come back up. Satisfied, I began copying all my files back over to the server (something that will take the better part of a day to complete).

Things to remember

All drives (in software RAID, partitions) need to be the same size. To ensure this, use fdisk or cfdisk to set the first partition, then copy that to the other drives you want to use: sfdisk -d /dev/sda | sfdisk /dev/sdb (and so on for /dev/sdc and /dev/sdd). This is useful to know if a drive in the array ever needs to be replaced.

After setting up the array, you need to let it settle. This is the synchronization of the drives that ultimately builds the fault protection. You can monitor the progress with: watch cat /proc/mdstat . This process takes ~2 hours (coincidentally, about the length of a long movie).

ToDo

Having a RAID 5 array for the fault protection sounds good. But I need to test it. Having fault protection is only as good as the recovery plan that must follow the fault.

The simple way to test is to simply disconnect one of the four drives. Since they're SATA drives, they're hot swappable, meaning I can remove the cable to one while the machine is still turned on. (I can't bring myself to do that, so I'll shutdown first, disconnect the cable, then bring the machine back up). This will simulate a dead drive. I'll need to find a spare 500GB drive and put it in place, and watch the array rebuild itself.

get userping