zfs 6 storage disks + usb boot

354 Views Asked by At

I am organizing my 6 disks into a ZFS pool. As my motherboard (supermicro) can handle 6 SATA, I thought to have the operating system (centOS7) on an external USB stick (32Gb) in order to exploit the maximum number of disks for ZFS.

The external USB centos stick works nicely (for now) and the idea is that I don't care too much of that as the important data will be on the ZFS pool. Is this correct? Or ZFS will store something on the OS? Ideally, if the usb stick dies I should be able to replace it with another one, and the ZFS pool configuration should be written into the ZFS pool itself.

Having 6 disks (1Tb each) and knowing that the pool will not grow (I mean I can't have more SATA unless I invest $$), what's the best configuration you would suggest? Of course, I am concerned about redundancy, I believe I should mirror 3 disks.. overkill? comments?

1

There are 1 best solutions below

2
On BEST ANSWER

I think keeping the OS separate from the data pool makes sense. Just make sure you have a way to recreate everything on the USB stick; otherwise you'll be sorry if it gets lost or corrupted.

For your ZFS pool configuration, I see several options:

  • 6 stripes
    • metadata-only redundancy (no data redundancy)
    • optimal storage capacity
    • near-optimal read performance
    • optimal write performance
  • 2 stripes, each is a 3-way mirror
    • can tolerate loss of any 2 disks, plus sometimes tolerance of up to 4 disks (when 2 die in each stripe)
    • 66% of the available storage wasted for redundant data
    • optimal read performance
    • 3x slower max write throughput
  • raidz1 with all 6 disks
    • can tolerate loss of any 1 disk (exactly 1 disk, no more)
    • ~20% of available storage wasted for redundant data
    • near-optimal read performance
    • read-modify-write for sub-stripe updates limits max write throughput, maybe 2x
  • raidz2 with all 6 disks
    • can tolerate loss of any 2 disks (exactly 2 disks, no more)
    • ~35% of available storage wasted for redundant data
    • near-optimal read performance
    • read-modify-write for sub-stripe updates limits max write throughput, maybe 2x

Unless you care a lot about write throughput (which you probably don't if you are considering using 3-way mirroring), I would probably recommend raidz2 for most use cases. It's very unlikely that you will lose 3 drives independently (if you do, there was probably a natural disaster / electrical problem / etc), and raidz2 is "efficient" in that it stores exactly the amount of redundant data to survive two failures, but no more, which means you get a lot more storage to use for your data compared to using two 3-way mirrors (33% -> ~65%).

Besides the two 3-way mirrors, there are other "inefficient" data redundancy techniques you could use too, such as 2x raidz1 with 3 disks each. I didn't mention any of those because I think their tradeoffs aren't as good as the ones I listed above.

One downside of RAID-Z to keep in mind though, is that it's hard to add more disks to an existing pool after the fact. There's a feature under development for this called "RAID-Z expansion", but I don't think it's shipped anywhere yet. Once it's released you should be able to use it on an existing pool, though it won't allow you to add more parity (just more data disks).

Today the most common way to expand a RAID-Z pool is doubling the capacity, by adding 6x more drives in another raidz2 group, and striping between the two RAID-Z groups. Alternately, you could use zfs send -> zfs receive to copy your data into a new pool that's configured however you like, which is more flexible but also more involved than adding more disks to the same pool.