Best way to set up partitions and RAID for 1 × M.2 NVMe + 4 × SATA?

Intro

Almost a year ago I started a discussion about ZFS on YunoHost, so this is in a way a continuation of that thinking from my PoV, but as for others finding this thread it would be of different value, I decided to make it a separate topic.

If this topic comes to a good conclusion, I’d be happy to turn it into a Wiki or other form of documentation later on. But I am far from a file system expert and at the time of this first post a bit in panic mode.

Available hardware

TL;DR:

  • 1 × M.2
  • 4 × SATA
  • 32 GB RAM
  • capable amd64 CPU

The system I have is still the same as in the linked topic above (AMD Ryzen 5 4600G with 32 GB of RAM in a MiniATX), with a notable difference.

I accidentally managed to grind my system SSD (NVME M.2) in less than a year to the point where it is failing me now. Which is why I’m now very quickly considering a new disk/partition set-up.
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-29-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 990 PRO 1TB
Serial Number:                      S6Z1NJ0W732771H
Firmware Version:                   3B2QJXD7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       2.0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            988,697,219,072 [988 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 4731411541
Local Time is:                      Sun Nov 24 13:20:15 2024 CET
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0055):     Comp DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x2f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg *Other*
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.39W       -        -    0  0  0  0        0       0
 1 +     9.39W       -        -    1  1  1  1        0     200
 2 +     9.39W       -        -    2  2  2  2        0    1000
 3 -   0.0400W       -        -    3  3  3  3     2000    1200
 4 -   0.0050W       -        -    4  4  4  4      500    9500

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- NVM subsystem reliability has been degraded

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x04
Temperature:                        61 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    106%
Data Units Read:                    4,227,717 [2.16 TB]
Data Units Written:                 3,845,028,947 [1.96 PB]
Host Read Commands:                 96,669,116
Host Write Commands:                7,514,585,410
Controller Busy Time:               17,898
Power Cycles:                       24
Power On Hours:                     4,100
Unsafe Shutdowns:                   20
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               61 Celsius
Temperature Sensor 2:               67 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

This text will be hidden

In addition to that dying SSD on 1 × M.2 NVMe I have 4 × SATA slots (6, according to dmesg, but motherboard manual disagrees). There are also several PCIe slots available, so if needed I could buy an expansion card (e.g. for more M.2 NVMe). I am somewhat limited by the format of my Mini ITX case though.

From the 4 × SATA drives, I (currently plan to) have:

  • 2 × 1 TB HDD (WD Red 2.5") on SATA – these I already have and although they are 5 and 7 years old, respectively, they are in much better condition than the NVME SSD.
  • planning to buy 2 × 4 TB HDD (1 × WD Red Plus, 1 × Seagate IronWolf) to put into the
  • (just in case there is also 1 × 500 GB SSD (Crucial) on SATA lying around)
  • (another M.2 NVMe SSD eventually)

Disk layout and file system

Idea A: 2 × SSD in Btrfs RAID1 + 2 × HDD in Btrfs RAID1

Put two SSD into Btrfs RAID1 in order to easily replace a dead or dying drive:

  • 1 × SSD on M.2 NMVe
  • 1 × SSD on SATA

and the put the two HDD into Btrfs RAID1 too, for the same reason:

  • 2 × HDD on SATA

The idea here is that what needs to be fast would run on the SSD (pair), while what is fine to run on an HDD would be moved there.

In addition I would put my laptops’ (Borg)backups on those 2×HDD too.

(Ideally, sometime down the line I would then also get a disk (pair) at a different location and send Btrfs snapshots there; but I’d need to think what makes sense to send via internet.)

Idea B: (1 × SSD backed up with Brtfs snapshots on 1 × SSD/HDD) + 2 × HDD in Btrfs RAID1

This one is a bit different as it relies on snapshot “backups“ instead of RAID for quick recovery from a failed drive.

  1. put on Btrfs on the M.2 NVMe SSD and mount that as /
  2. put Btrfs on the other SSD (or HDD) connected via SATA
  3. make regular (hourly?) Btrfs Snapshots from the M.2 to the SATA SSD

If at one point the first M.2 SSD dies, simply mount the SATA SSD instead.

For the other 2 × HDD the idea is the same as above.

The idea here is that what needs to be fast would run on the SSD (pair), while what is fine to run on an HDD would be moved there.

In addition I would put my laptops’ (Borg)backups on those 2×HDD too.

(Ideally, sometime down the line I would then also get a disk (pair) at a different location and send Btrfs snapshots there; but I’d need to think what makes sense to send via internet.)

Idea C: 1 × SDD + 3 × HDD in RAID10 (or 2 × SSD + 2 × HDD in RAID10)

Very simply, the idea is to put everything into a single Btrfs RAID10 and get the best of both worlds as (hopefully) Btrfs would:

  • first write to and read from the SSD, so that’s the speed benefit
  • when SSD fails, automatically fall back to HDD

I have a suspicion this is asking Btrfs a bit too much, but if it’s not, that might be a cool solution.

I also suspect that this would still burn through the SSD just as fast.

Sub-Idea Z: Same as something above, but with hardware RAID

According to the manual my motherboard supports (hardware) RAID0, RAID1 and RAID10 which I can set-up through the BIOS.

On one hand that sounds very easy to set up, on the other, I am concerned how things work in hardware-RAID-land, how do I deal with a dead drive and what happens when I eventually, but inevitably have to move these drives into a new motherboard.

SSD or HDD as the main drive?

With the physical drives out of the way, the question is how exactly the mount points should be divided between these drives.

YunoHost already has great documentation as a start, but I’d like to discuss this further here (and eventually update the documentation if applicable).

Option 1: SSD as main drive

The idea here is simply to have the SSD (pair) as the main drive, mounted to /, and put on the HDD (pair) only the mountpoints that would hurt the SSD.

Option 2: HDD as main drive

Essentially similar, but the approach is from the other way around.

Put everything on the HDD (pair), mounted to /, and put on the SSD (pair) only the mountpoints that would specifically benefit to be on a faster drive.

(Silly idea: mount overlaying)

I don’t know if this is a thing, but since you can (accidentally) “overlay a mount”, I thought maybe that would be something interesting to take advantage of.

Here’s what I mean:

  1. have everything on a HDD, mount it to /
  2. copy parts (e.g. stuff that needs to go fast) of it to SSD, and mount that to relevant mount points – e.g. /opt, /var/www, /var/lib/postgresql, /var/lib/mysql, …
  3. “back up” (e.g. with Btrfs snapshots or similar) regularly things on the SSD to the HDD
  4. if SSD dies, just unmount/unplug it, as everything is already on the HDD too anyway

I have a hard hunch this is a stupid idea, but in case it isn’t, I’d love to hear about it more.

Decision

In the end I decided for the following set up:

Install Yunohost on 2 × HDD on SATA in Btrfs RAID1.

Later I intend to add another (2 ×) HDD(s in Btrfs RAID1) for backups of the above. I am undecided yet whether to make these as Btrfs snapshots or some other way.

As for the NVMe SSD, I will first see if I actually need it for performance or not. If it turns out things are fast enough on the HDDs, I will save me the trouble. But if anything, I would keep HDD as the main drive and put only certain mountpoints onto the SSD (and back them up).

Comments on some of the ideas

From what I gather…

_ Idea A: 2 × SSD in Btrfs RAID1 + 2 × HDD in Btrfs RAID1_ would work, but I don’t think I want more than one SSD in there

Idea B: (1 × SSD backed up with Brtfs snapshots on 1 × SSD/HDD) + 2 × HDD in Btrfs RAID1 would actually work, and I might take it as part of a solution later on, if I decide to use an SSD at all.

Idea C: 1 × SDD + 3 × HDD in RAID10 (or 2 × SSD + 2 × HDD in RAID10), in Btrfs it would actually be worst of both worlds as it would be just as slow as the HDD and still burn through the SSD just as fast. It might work in ZFS though.

Regarding Sub-Idea Z: Same as something above, but with hardware RAID, it seems my hunch is right that hardware RAID has way too many caveats for it to make sense if it can at all be avoided. Also Btrfs RAID1(0) has the benefit of self-healing (same with ZFS, I suppose).

Discussion

I am at best an amateur sysadmin and not a file system expert at all, so I am not going to mark this as a solution, but keep open for discussion. Still, happy to turn this into a Wiki, if people find it useful, after some discussion.

Proxmox

YunoHost in a vm

Data /home/yunohost.multimedia

Backups, snapshots

Why? Proxmox seems a needless extra complication. What would I gain from running a VM on my own dedicated server?

As for running most of it on the SSD, I’m going through this migration right now exactly because my SSD is dying well prematurely, because it got too much write cycles in the server.