Mail d'erreur SMART error

Guygoye · September 6, 2025, 12:12pm

What type of hardware are you using: Raspberry Pi 3, 4+
What YunoHost version are you running: 12.1.17.1
How are you able to access your server: The webadmin
SSH
Are you in a special context or did you perform specific tweaking on your YunoHost instance ?: Non

Describe your issue

Bonjour,

Depuis la mise à jour vers la version 12.1, je reçois des messages d’erreur de mon serveur avec comme objet:

SMART error (CurrentPendingSector) detected on host:

Le message est celui-ci:

This message was generated by the smartd daemon running on:

host name: pgcc
DNS domain: ynh.fr

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 8 Currently unreadable (pending) sectors

Device info:
SPCC Solid State Disk, S/N:AA230130S3051201289, WWN:0-000000-000000000, FW:HPS1104J, 512 GB

For details see host’s SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Sat Aug 30 17:12:40 2025 BST
Another message will be sent in 24 hours if the problem persists.

De ce que je comprend il y a des secteurs de mon disque qui sont endommagés.
Est-ce que c’est bien ça ?
Au quel cas, quelles sont les mesures à prendre ?

J’ai cherché dans les logs mais je n’ai pas trouvé ou cherché au bon endroit.

Merci à tous pour vos réponses

Share relevant logs or error messages

sudo smartctl -a /dev/sda
smartctl 7.3 2022-02-28 r5338 [aarch64-linux-6.1.21-v8+] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SPCC Solid State Disk
Serial Number:    AA230130S3051201289
LU WWN Device Id: 0 000000 000000000
Firmware Version: HPS1104J
User Capacity:    512,110,190,592 bytes [512 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5319
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Sep  6 13:11:09 2025 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  120) seconds.
Offline data collection
capabilities: 			 (0x5d) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Abort Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0002)	Does not save SMART data before
					entering power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (   4) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       19388
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       116
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
161 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       236
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       35
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       338
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       1
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       41
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       100
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       1006632960
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       2423434
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       660206
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       318767104
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       115
194 Temperature_Celsius     0x0032   100   100   050    Old_age   Always       -       40
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       0
241 Total_LBAs_Written      0x0032   100   100   050    Old_age   Always       -       55237
242 Total_LBAs_Read         0x0032   100   100   050    Old_age   Always       -       977733
249 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       203285

SMART Error Log Version: 1
ATA Error Count: 6 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 6 occurred at disk power-on lifetime: 18098 hours (754 days + 2 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 40 00 c0 88 e0 51   at LBA = 0x01e088c0 = 31492288

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 c0 88 e0 40 00  11d+13:58:35.500  WRITE FPDMA QUEUED
  e7 08 00 00 20 28 00 00  11d+13:58:34.440  FLUSH CACHE
  61 08 00 00 20 28 40 00  11d+13:58:34.440  WRITE FPDMA QUEUED
  e7 08 00 60 96 29 00 00  11d+13:58:34.440  FLUSH CACHE
  61 08 00 60 96 29 40 00  11d+13:58:34.440  WRITE FPDMA QUEUED

Error 5 occurred at disk power-on lifetime: 10514 hours (438 days + 2 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 40 00 b0 c0 3c 51   at LBA = 0x013cc0b0 = 20758704

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 b0 c0 3c 40 00   5d+01:05:22.604  WRITE FPDMA QUEUED
  e7 08 00 b0 c0 3c 00 00   5d+01:05:22.554  FLUSH CACHE
  61 08 00 b0 c0 3c 40 00   5d+01:05:22.544  WRITE FPDMA QUEUED
  e7 08 00 b0 c0 3c 00 00   5d+01:05:22.544  FLUSH CACHE
  61 08 00 b0 c0 3c 40 00   5d+01:05:22.544  WRITE FPDMA QUEUED

Error 4 occurred at disk power-on lifetime: 8675 hours (361 days + 11 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 40 00 00 a5 3c 51   at LBA = 0x013ca500 = 20751616

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 00 a5 3c 40 00  11d+08:53:37.510  WRITE FPDMA QUEUED
  e7 08 00 00 a5 3c 00 00  11d+08:53:37.500  FLUSH CACHE
  61 08 00 00 a5 3c 40 00  11d+08:53:37.500  WRITE FPDMA QUEUED
  e7 08 00 00 a5 3c 00 00  11d+08:53:37.480  FLUSH CACHE
  61 08 00 00 a5 3c 40 00  11d+08:53:37.480  WRITE FPDMA QUEUED

Error 3 occurred at disk power-on lifetime: 8402 hours (350 days + 2 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 40 00 90 f6 3c 51   at LBA = 0x013cf690 = 20772496

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 90 f6 3c 40 00  21d+06:39:34.670  WRITE FPDMA QUEUED
  e7 08 00 90 f6 3c 00 00  21d+06:39:34.660  FLUSH CACHE
  61 08 00 90 f6 3c 40 00  21d+06:39:34.660  WRITE FPDMA QUEUED
  e7 08 00 90 f6 3c 00 00  21d+06:39:34.650  FLUSH CACHE
  61 08 00 90 f6 3c 40 00  21d+06:39:34.650  WRITE FPDMA QUEUED

Error 2 occurred at disk power-on lifetime: 6606 hours (275 days + 6 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 40 00 c8 dd 3b 51   at LBA = 0x013bddc8 = 20700616

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 c8 dd 3b 40 00      05:07:06.030  WRITE FPDMA QUEUED
  e7 08 00 28 9f 29 00 00      05:07:05.410  FLUSH CACHE
  61 08 00 28 9f 29 40 00      05:07:05.410  WRITE FPDMA QUEUED
  e7 40 00 e8 9e 29 00 00      05:07:05.400  FLUSH CACHE
  61 40 00 e8 9e 29 40 00      05:07:05.330  WRITE FPDMA QUEUED

SMART Self-test log structure revision number 0
Warning: ATA Specification requires self-test log structure revision number = 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

rodinux · September 6, 2025, 3:23pm

J’ai aussi des alertes sur des serveurs, un avec sur un disque nvme, l’autre non car des disques HDD en RAID…

Mais de ce que je comprends ces warnings peuvent ne pas être importants et la cause peut être très diverse, difficile à diagnostiquer… C’est la nouveauté cette écoute monitoring des disques…

Guygoye · September 6, 2025, 4:01pm

Ok,

Merci @rodinux. J’ai eu pas mal de coupures éléctriques intempestives, le résultat d’un réseau pourri.
Ca ne m’étonnerait pas d’avoir des secteurs défectueux sur mon SSD.

Je vais attendre d’autres avis. S’il y a une solution pour mettre en sourdine cette alerte conditionné à de nouveaux secteurs défectueux ce serait pas mal.

En attendant je continue de surveiller, c’est ce qu’un amis adminsys m’a conseillé.

ewilly · September 6, 2025, 9:06pm

Installer scrutiny serait peut etre une bonne idėe pour surveiller ton disque

stilobique · September 7, 2025, 9:30am

Intéressante cette apps.
J’ai aussi ces mails d’erreur, je dirais depuis la migration ; toujours dur d’analyser le moment où ça commence.

Guygoye · September 7, 2025, 6:01pm

L’avantage de cette mise à jour c’est de nous alerter sur les éventuels problèmes.

@ewilly je n’ai pas trouvé ton logiciel …

ewilly · September 7, 2025, 6:47pm

C’est une app dans le catalogue (GitHub - YunoHost-Apps/scrutiny_ynh: WebUI for smartd S.M.A.R.T monitoring)

Guygoye · September 8, 2025, 5:49pm

Merci, c’est installé!

Je lui ai demandé de me notifier seulement s’il y a de nouveaux évènements.

On verra si ça marche

rodinux · September 10, 2025, 10:48pm

J’ai pu résoudre sur un des serveur ou j’avais ce warning

This message was generated by the smartd daemon running on:

   host name:  linux07
   DNS domain: fr

The following warning/error was logged by the smartd daemon:

Device: /dev/sdc [SAT], ATA error count increased from 12 to 13

Device info:
HGST HUS724020ALA640, S/N:PN2134P5G2MDGX, WWN:5-000cca-24ec13193, FW:MF6OABY0, 2.00 TB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
Another message will be sent in 24 hours if the problem persists.

J’ai lancé un

smartctl -t long /dev/sdc

Il a du mettre 324 minutes pour ce test (2 To)… Mais ça fonctionné il me semble, plus d’alertes.

crustyourmind · September 11, 2025, 10:56am

L’ennui de scrutiny c’est qu’il n’est pas compatible avec grafana, ils utilisent des versions différentes d’influxdb. Dommage…

Guygoye · September 14, 2025, 6:43pm

Bonjour,

Effectivement Scrutiny, n’est pas d’une grande j’ai tenté de le configurer mais ça n’a pas fonctionné.

@rodinux je suis en train d’essayer ta solution, il me dit 4 min avec un 500 Go.
Bon j’ai un très faible taux de remlissage ( moins de 30 Go).

Je reviens vers vous pour vous tenir au courant.

rodinux · September 14, 2025, 8:21pm

Par contre je regrette de ne pas avoir ajouter une sortie journal à la commande

smartctl -t long /dev/sdc > /var/log/long.text

Guygoye · September 15, 2025, 6:47pm

Ca n’a pas fonctionné chez moi. J’ai toujours des erreurs et donc des mails.
Je n’ai pas reussi non plus à envoyer la sortie vers un fichier, je n’ai pas les droits sur /var/log/ …
Normalement en sudo ça devrait fonctionner non ?

crustyourmind · September 16, 2025, 4:48am

oui pour le sudo (il faut être admin pour cette commande)

system · October 16, 2025, 4:48am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.