Uh Oh issues with one of the Harddrives aka failing HDD!

My YunoHost server

Hardware: 1U AMD64 Rackserver
YunoHost version: Must be the latest, I can’t check right now :frowning:
I have access to my server: Through SSH, through the webadmin, direct access via keyboard/screen
Are you in a special context or did you perform some particular tweaking on your YunoHost instance?: Yes, it’s behind a VPN.

Description of my issue

I’ve encountered a significant issue with YunoHost that I’d like to bring to everyone’s attention. Recently, I had a failing hard drive in my server, and to my surprise, there was no notification or alert from YunoHost regarding this critical hardware failure. This lack of communication is quite concerning for several reasons:

  1. No Diagnostic Warnings: There were no warnings or errors detected in the diagnostic tests.
  2. CLI Environment Silence: The CLI environment did not provide any indications of the failing hardware.
  3. No System Emails: There were no system emails sent out to notify the administrators of the imminent hardware failure.

While experienced Linux users might recognize symptoms like:

Failed to execute 'pager', using next fallback pager: Input/output error
Failed to execute 'less', using next fallback pager: Input/output error
Failed to execute 'more', using next fallback pager: Input/output error

…it is still not an excuse for the system’s lack of direct communication about such a critical issue.

TIP: For anyone new(er) to Linux, the Input/Output error indicates that the drive is unable to read or write data.

Suggestion for Improvement: I strongly urge the YunoHost development team to implement an emergency notification system for hardware failures. When a hard drive or any other storage device is failing, YunoHost should:

  1. Send Emergency Alerts: Immediately send emergency emails or notifications to all admins or groups with access to the admin panel.
  2. CLI Error Push: Push a prominent error message via the CLI upon login to alert users directly.

This feature is crucial to prevent data loss and ensure system administrators can take timely action to mitigate hardware issues.

Question and Suggestion: Once a failed hard drive is replaced, YunoHost should have a feature like a “clone drive” button that allows us to make a copy of the failing drive to the new one. If YunoHost can implement this, replacing a failing hard drive would take minutes of working time instead of hours. This would greatly improve the efficiency and ease of managing hardware failures.

As it stands, the lack of communication left me dazed and confused, but I am trying to continue managing my server despite the hardware issues. Implementing such a notification system and a cloning feature would greatly enhance the reliability and user trust in YunoHost.

Thank you for considering this suggestion. I look forward to hearing your thoughts and any plans for implementation.

Best regards,
Woozy (still dazed and confused but I really try to continue)…
Also, I have another drive with all of the backups and another one just in case, specifically with all the stuff just after install.

P.S. Feel free to share your thoughts and maybe tell us what you do when your hard drives or SSDs are failing on you?

When setting up a new server, I always install & configure
https://www.smartmontools.org/

It could be setup to send email to root (so to all admin YH users) every time a major incident occurs on the disk

1 Like

Yes sure, but then again, why isn’t such a thing already part of YunoHost? It’s very harsh to find out that your server drives are dying the hard way. When it has already become a read-only drive, it is often too late since you can only remount it as read/write a couple of times before it completely fails. If I had known sooner, I would have been able to clone the drive using Clonezilla, use badblocks to block off bad sectors, or indeed use smarttools to take preventive action before it died.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.