Mariadb corruption

quiwy · February 5, 2025, 2:20pm

What type of hardware are you using: Other(?)
What YunoHost version are you running: 12.0.11
How are you able to access your server: SSH
Are you in a special context or did you perform specific tweaking on your YunoHost instance ?: No

Describe your issue

Hello,

Since this morning, I’ve encountered continuous errors related to MariaDB, with messages like this in the logs:

Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=54]. You may have to r>
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: Page dump (16384 bytes):
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: 9248670400000036000000000000000000000016bc1b836f0006000000000000
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: 000000000000000000000000000000000000ffffffff0000ffffffff00000000
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: 0000000000021232ffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
[...]

I noticed these errors while trying to access Nextcloud. Thinking a reinstall might fix the problem, I uninstalled Nextcloud and tried to reinstall it. However, the installation failed with this error:

Warning: Here's an extract of the logs before the crash. It might help debugging the error:
Info: DEBUG - ++ toml_to_json
Info: DEBUG - ++ python3 -c 'import toml, json, sys; print(json.dumps(toml.load(sys.stdin)))'
Info: DEBUG - + local group=www-data
Info: DEBUG - + [[ -z www-data ]]
Info: DEBUG - + chmod -R u=rwX,g=rX,o=--- /var/www/nextcloud
Info: DEBUG - + chown -R nextcloud:www-data /var/www/nextcloud
Info: DEBUG - + return
Info: INFO - [##++................] > Restoring the MySQL database...
Info: DEBUG - + ynh_mysql_db_shell
Info: DEBUG - + local database=nextcloud
Info: DEBUG - + local default_character_set=
Info: DEBUG - + [[ -n nextcloud ]]
Info: DEBUG - ++ mysql -B nextcloud
Info: DEBUG - ++ tail -n1
Info: DEBUG - ++ cut -f2
Info: DEBUG - + default_character_set=utf8mb4
Info: DEBUG - + default_character_set=--default-character-set=utf8mb4
Info: DEBUG - + mysql --default-character-set=utf8mb4 -B nextcloud
Info: WARNING - ERROR 1036 (HY000) at line 22: Table 'memories_planet_geometry' is read only
Info: DEBUG - + ynh_exit_properly
Info: Removing nextcloud…
Info: [++++++++++..........] > Removing system configurations related to nextcloud...
Info: '/etc/nginx/conf.d/cloud.example.domain.d/nextcloud.conf' wasn't deleted because it doesn't exist.
Info: '/etc/nginx/conf.d/cloud.example.domain.d/nextcloud.d' wasn't deleted because it doesn't exist.
Info: '/etc/php/8.3/fpm/pool.d/nextcloud.conf' wasn't deleted because it doesn't exist.
Info: '/etc/fail2ban/jail.d/nextcloud.conf' wasn't deleted because it doesn't exist.
Info: '/etc/fail2ban/filter.d/nextcloud.conf' wasn't deleted because it doesn't exist.
Info: '/etc/cron.d/nextcloud' wasn't deleted because it doesn't exist.
Info: [####################] > Removal of nextcloud completed
Info: Deprovisioning database...
Warning: ERROR 1036 (HY000) at line 1: Table 'memories_planet_geometry' is read only
Info: Deprovisioning apt...
Info: Deprovisioning permissions...
Info: Deprovisioning data_dir...
Info: Deprovisioning install_dir...
Info: Deprovisioning system_user...
Info: Deprovisioning sources..

Other services seem to be affected as well. For instance, I can’t access Grafana. Here are some log excerpts:

Feb 05 14:49:10 example.domain influxd-systemd-start.sh[2497]: [httpd] ::1 - - [05/Feb/2025:14:49:10 +0100] "POST /query?db=opentsdb&epoch=ms HTTP/1.1 {'q': 'SELECT mean("value") FROM "netdata.sensors.sensor_chip_cpu_thermal_virtual_0_feature_cpu_temp_subfeature_temp1_input_temperature.temperature" WHERE time >= 1738763050000ms and time <= 1738763350000ms GROUP BY time(1s) fill(none) ORDER BY time ASC'}" 200 57 "-" "Grafana/10.4.2" f9c43705-e3c7-11ef-8013-001e06490802 2139
Feb 05 14:49:10 example.domain grafana[4063]: logger=ngalert.state.manager.persist t=2025-02-05T14:49:10.021354183+01:00 level=error msg="Failed to save alert state" labels="__alert_rule_namespace_uid__=Jbn7cTMVz, __alert_rule_uid__=sx7ncTGVk, alertname=Temperature alert, datasource_uid=0, grafana_folder=General Alerting, ref_id=A, rule_uid=sx7ncTGVk" state="unsupported value type" error="Error 1836 (HY000): Running in read-only mode"
Feb 05 14:49:12 example.domain grafana[4063]: logger=secrets t=2025-02-05T14:49:12.475268507+01:00 level=error msg="Failed to get current data key" error="Error 1836 (HY000): Running in read-only mode" label=2025-02-05/root@secretKey.v1
Feb 05 14:49:12 example.domain grafana[4063]: logger=login.authinfo t=2025-02-05T14:49:12.475395177+01:00 level=error msg="failed to set auth info in cache" error="Error 1836 (HY000): Running in read-only mode"
Feb 05 14:49:12 example.domain grafana[4063]: logger=user.sync t=2025-02-05T14:49:12.483248518+01:00 level=error msg="Failed to update user" error="Error 1836 (HY000): Running in read-only mode" auth_module=ldap auth_id="uid=user,ou=users,dc=yunohost,dc=org"
Feb 05 14:49:12 example.domain grafana[4063]: logger=authn.service t=2025-02-05T14:49:12.483415189+01:00 level=error msg="Failed to run post auth hook" client=auth.client.proxy id= error="[user.sync.internal] unable to update user"

I’ve also seen similar errors on Yourls and other services using MariaDB.

Running mysqlcheck --all-databases gives me many corruption errors on several tables:

grafana.alert
Warning  : InnoDB: Index 'IDX_alert_org_id_id' contains 0 entries, should be 1.
Warning  : InnoDB: Index 'IDX_alert_state' contains 0 entries, should be 1.
Warning  : InnoDB: Index 'IDX_alert_dashboard_id' contains 0 entries, should be 1.
error    : Corrupt
grafana.alert_configuration
Warning  : InnoDB: Index 'UQE_alert_configuration_org_id' contains 0 entries, should be 1.
error    : Corrupt
grafana.alert_configuration_history                OK
grafana.alert_image                                OK
grafana.alert_instance
Warning  : InnoDB: Index 'IDX_alert_instance_rule_org_id_rule_uid_current_state' contains 0 entries, should be 1.
Warning  : InnoDB: Index 'IDX_alert_instance_rule_org_id_current_state' contains 0 entries, should be 1.
error    : Corrupt
grafana.alert_notification
Warning  : InnoDB: Index 'UQE_alert_notification_org_id_uid' contains 0 entries, should be 1.
error    : Corrupt
grafana.alert_notification_state
Warning  : InnoDB: Index 'UQE_alert_notification_state_org_id_alert_id_notifier_id' contains 0 entries, should be 1.
Warning  : InnoDB: Index 'IDX_alert_notification_state_alert_id' contains 0 entries, should be 1.
error    : Corrupt
grafana.alert_rule
Warning  : InnoDB: Index 'UQE_alert_rule_org_id_uid' contains 0 entries, should be 1.
Warning  : InnoDB: Index 'UQE_alert_rule_org_id_namespace_uid_title' contains 0 entries, should be 1.
Warning  : InnoDB: Index 'IDX_alert_rule_org_id_namespace_uid_rule_group' contains 0 entries, should be 1.
Warning  : InnoDB: Index 'IDX_alert_rule_org_id_dashboard_uid_panel_id' contains 0 entries, should be 1.
error    : Corrupt
grafana.alert_rule_tag                             OK
grafana.alert_rule_version
Warning  : InnoDB: Index 'UQE_alert_rule_version_rule_org_id_rule_uid_version' contains 0 entries, should be 3.
Warning  : InnoDB: Index 'IDX_alert_rule_version_rule_org_id_rule_namespace_uid_rule_group' contains 0 entries, should be 3.
error    : Corrupt
grafana.annotation
Warning  : InnoDB: Index 'IDX_annotation_org_id_alert_id' contains 0 entries, should be 1024.
Warning  : InnoDB: Index 'IDX_annotation_org_id_type' contains 0 entries, should be 1024.
Warning  : InnoDB: Index 'IDX_annotation_org_id_created' contains 0 entries, should be 1024.
Warning  : InnoDB: Index 'IDX_annotation_org_id_updated' contains 0 entries, should be 1024.
Warning  : InnoDB: Index 'IDX_annotation_org_id_dashboard_id_epoch_end_epoch' contains 0 entries, should be 1024.
Warning  : InnoDB: Index 'IDX_annotation_org_id_epoch_end_epoch' contains 0 entries, should be 1024.
Warning  : InnoDB: Index 'IDX_annotation_alert_id' contains 0 entries, should be 1024.
error    : Corrupt
grafana.annotation_tag                             OK
grafana.anon_device                                OK
grafana.api_key                                    OK
grafana.builtin_role
Warning  : InnoDB: Index 'UQE_builtin_role_org_id_role_id_role' contains 0 entries, should be 3.
Warning  : InnoDB: Index 'IDX_builtin_role_role_id' contains 0 entries, should be 3.
Warning  : InnoDB: Index 'IDX_builtin_role_role' contains 0 entries, should be 3.
Warning  : InnoDB: Index 'IDX_builtin_role_org_id' contains 0 entries, should be 3.
error    : Corrupt
grafana.cache_data                                 OK
grafana.correlation                                OK
grafana.dashboard
Warning  : InnoDB: Index 'UQE_dashboard_org_id_uid' contains 0 entries, should be 3.
Warning  : InnoDB: Index 'IDX_dashboard_org_id' contains 0 entries, should be 3.
Warning  : InnoDB: Index 'IDX_dashboard_gnet_id' contains 0 entries, should be 3.
Warning  : InnoDB: Index 'IDX_dashboard_org_id_plugin_id' contains 0 entries, should be 3.
Warning  : InnoDB: Index 'IDX_dashboard_title' contains 0 entries, should be 3.
Warning  : InnoDB: Index 'IDX_dashboard_is_folder' contains 0 entries, should be 3.
Warning  : InnoDB: Index 'UQE_dashboard_org_id_folder_uid_title_is_folder' contains 0 entries, should be 3.
Warning  : InnoDB: Index 'IDX_dashboard_org_id_folder_id_title' contains 0 entries, should be 3.
error    : Corrupt
grafana.dashboard_acl
Warning  : InnoDB: Index 'UQE_dashboard_acl_dashboard_id_user_id' contains 0 entries, should be 2.
Warning  : InnoDB: Index 'UQE_dashboard_acl_dashboard_id_team_id' contains 0 entries, should be 2.
Warning  : InnoDB: Index 'IDX_dashboard_acl_dashboard_id' contains 0 entries, should be 2.
Warning  : InnoDB: Index 'IDX_dashboard_acl_user_id' contains 0 entries, should be 2.
Warning  : InnoDB: Index 'IDX_dashboard_acl_team_id' contains 0 entries, should be 2.
Warning  : InnoDB: Index 'IDX_dashboard_acl_org_id_role' contains 0 entries, should be 2.
Warning  : InnoDB: Index 'IDX_dashboard_acl_permission' contains 0 entries, should be 2.
error    : Corrupt
grafana.dashboard_provisioning                     OK
grafana.dashboard_public                           OK
grafana.dashboard_snapshot                         OK
grafana.dashboard_tag                              OK
grafana.dashboard_version
Warning  : InnoDB: Index 'UQE_dashboard_version_dashboard_id_version' contains 0 entries, should be 29.
Warning  : InnoDB: Index 'IDX_dashboard_version_dashboard_id' contains 0 entries, should be 29.
error    : Corrupt
grafana.data_keys                                  OK
grafana.data_source
Warning  : InnoDB: Index 'UQE_data_source_org_id_name' contains 0 entries, should be 2.
Warning  : InnoDB: Index 'UQE_data_source_org_id_uid' contains 0 entries, should be 2.
Warning  : InnoDB: Index 'IDX_data_source_org_id' contains 0 entries, should be 2.
Warning  : InnoDB: Index 'IDX_data_source_org_id_is_default' contains 0 entries, should be 2.
error    : Corrupt
grafana.entity_event                               OK
grafana.file                                       OK
grafana.file_meta                                  OK
grafana.folder
[...]

I tried a server reboot, mariadb is not starting now.

Feb 05 15:14:11 example.domain systemd[1]: mariadb.service: Deactivated successfully.
Feb 05 15:14:11 example.domain systemd[1]: Stopped mariadb.service - MariaDB 10.11.6 database server.
Feb 05 15:14:11 example.domain systemd[1]: mariadb.service: Consumed 1min 2.053s CPU time.
Feb 05 15:14:11 example.domain systemd[1]: Starting mariadb.service - MariaDB 10.11.6 database server...
Feb 05 15:14:11 example.domain mariadbd[24829]: 2025-02-05 15:14:11 0 [Note] Starting MariaDB 10.11.6-MariaDB-0+deb12u1 source revision  as process 24829
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: Compressed tables use zlib 1.2.13
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: Number of transaction pools: 1
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: Using ARMv8 crc32 + pmull instructions
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: Using liburing
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: Initializing buffer pool, total size = 128.000MiB, chunk size = 2.000MiB
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: Completed initialization of buffer pool
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: End of log at LSN=97652302122
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=54]. You may have to >
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: Page dump (16384 bytes):
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: 9248670400000036000000000000000000000016bc1b836f0006000000000000
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: 000000000000000000000000000000000000ffffffff0000ffffffff00000000
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: 0000000000021232ffffffffffffffffffffffffffffffffffffffffffffffff
[...]
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: End of page dump
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [ERROR] InnoDB: File './ibdata1' is corrupted
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB:  You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-mo>
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: Retry with innodb_force_recovery=5
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [ERROR] InnoDB: Plugin initialization aborted with error Page read from tablespace is corrupted.
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] InnoDB: Starting shutdown...
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [Note] Plugin 'FEEDBACK' is disabled.
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [ERROR] Unknown/unsupported storage engine: InnoDB
Feb 05 15:14:12 example.domain mariadbd[24829]: 2025-02-05 15:14:12 0 [ERROR] Aborting
Feb 05 15:14:12 example.domain systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE

It seems the MariaDB database is partially corrupted. Before considering a full restore, I’d like to know if anyone has encountered this type of issue and if there’s a method to try repairing the database without data loss.

Here are the db I have

MariaDB [(none)]> show databases
    -> ;
+--------------------+
| Database           |
+--------------------+
| grafana            |
| information_schema |
| mysql              |
| nextcloud          |
| performance_schema |
| snappymail         |
| sys                |
| yourls             |
+--------------------+

If not, I can tolerate data loss, as I’m sure I have nothing important from this morning. In fact, except for nextcloud, for which I have a yunohost backup already present on the server from last upgrade 2 days ago, and almost no change since, I could almost remove and reinstall other apps if needed as it should be much faster for than getting the offsite backup from S3.

Thanks in advance for your help!

Share relevant logs or error messages

https://paste.yunohost.org/raw/epufusefig

wbk · February 5, 2025, 3:45pm

I don’t recall having had a corrupted MySQL database. Usually that does not happen if there is no reason to. Things that come to mind to make a database go sour:

disk problems:
- full partition
- corrupted file system
- broken storage (disk/ssd/sd card/…)
memory problems
- running out of memory without swap → out-of-memory (OOM) process kill
- faulty memory modules
power loss (even in case of power loss, data structures get usually repaired on boot)
restore of ‘disk image’ backup without memory

The "(?) " is an invitation to tell what kind of hardware Namely: if you run on an Orange Pi with no-name uSD cards from Aliexpress, the chance of “failing storage” has another probability than when running “second hand server with enterprise SSD”, for example.

Can you speculate on the cause of the corruption? The first noticeable signs were this morning, does a log give information on when it started exactly?

It would be a shame to go through the process of getting your Yunohost back in shape, only to have the same problem occur again because of unnoticed failing RAM.

quiwy · February 5, 2025, 4:18pm

Thank you for your answer.

Ooops, sorry, forgot about the board with the rest of the message. It’s an odroid hc4, with an SD card in ro for boot partition (except when I need to update kernel or boot, I remount it in write) and 2 SSD in raid 1 for the system and data.

Well, I thought about disk full first, but it seems this is not the case.

Filesystem      Size  Used Avail Use% Mounted on
udev            1.5G     0  1.5G   0% /dev
tmpfs           371M  1.4M  369M   1% /run
/dev/sda1        56G   47G  9.0G  84% /
tmpfs           1.9G  1.5M  1.9G   1% /dev/shm
tmpfs           5.0M   12K  5.0M   1% /run/lock
/dev/loop0      103M  103M     0 100% /snap/core/14452
/dev/loop1      103M  103M     0 100% /snap/core/14789
/dev/mmcblk0p1   30G   58M   28G   1% /boot
/dev/sda2       872G  549G  324G  63% /home
tmpfs           371M     0  371M   0% /run/user/35930

root@user:/home/user#  btrfs fi us /home
Overall:
    Device size:                   1.70TiB
    Device allocated:              1.17TiB
    Device unallocated:          545.75GiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                          1.07TiB
    Free (estimated):            323.15GiB      (min: 323.15GiB)
    Free (statfs, df):           323.15GiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,RAID1: Size:597.00GiB, Used:546.72GiB (91.58%)
   /dev/sda2     597.00GiB
   /dev/sdb2     597.00GiB

Metadata,RAID1: Size:2.00GiB, Used:1.25GiB (62.53%)
   /dev/sda2       2.00GiB
   /dev/sdb2       2.00GiB

System,RAID1: Size:32.00MiB, Used:128.00KiB (0.39%)
   /dev/sda2      32.00MiB
   /dev/sdb2      32.00MiB

Unallocated:
   /dev/sda2     272.87GiB
   /dev/sdb2     272.87GiB
root@user:/home/user#  btrfs fi us /
Overall:
    Device size:                 111.76GiB
    Device allocated:            111.59GiB
    Device unallocated:          170.00MiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                         92.98GiB
    Free (estimated):              8.98GiB      (min: 8.98GiB)
    Free (statfs, df):             8.97GiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              140.95MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,RAID1: Size:54.76GiB, Used:45.87GiB (83.76%)
   /dev/sda1      54.76GiB
   /dev/sdb1      54.76GiB

Metadata,RAID1: Size:1.00GiB, Used:636.67MiB (62.17%)
   /dev/sda1       1.00GiB
   /dev/sdb1       1.00GiB

System,RAID1: Size:32.00MiB, Used:16.00KiB (0.05%)
   /dev/sda1      32.00MiB
   /dev/sdb1      32.00MiB

Unallocated:
   /dev/sda1      85.00MiB
   /dev/sdb1      85.00MiB

In mariadb logs, I don’t see what could hapened :

Jan 06 14:46:24 example.domain /etc/mysql/debian-start[2569]: Upgrading MySQL tables if necessary.
Jan 06 14:46:25 example.domain /etc/mysql/debian-start[2575]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
Jan 06 14:46:25 example.domain /etc/mysql/debian-start[2615]: Checking for insecure root accounts.
Jan 06 14:46:25 example.domain /etc/mysql/debian-start[2635]: Triggering myisam-recover for all MyISAM tables and aria-recover for all Aria tables
Jan 07 02:03:19 example.domain mariadbd[2163]: 2025-01-07  2:03:19 9293 [Warning] Aborted connection 9293 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error writing communication packets)
Jan 07 02:14:45 example.domain mariadbd[2163]: 2025-01-07  2:14:45 9304 [Warning] Aborted connection 9304 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error writing communication packets)
Jan 08 11:42:53 example.domain mariadbd[2163]: 2025-01-08 11:42:53 37337 [Warning] Aborted connection 37337 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 08 11:43:05 example.domain mariadbd[2163]: 2025-01-08 11:43:05 37648 [Warning] Aborted connection 37648 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 08 11:43:16 example.domain mariadbd[2163]: 2025-01-08 11:43:16 37649 [Warning] Aborted connection 37649 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 09 06:56:52 example.domain mariadbd[2163]: 2025-01-09  6:56:52 51015 [Warning] Aborted connection 51015 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 13 10:13:22 example.domain mariadbd[2163]: 2025-01-13 10:13:22 184566 [Warning] Aborted connection 184566 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 13 19:42:06 example.domain mariadbd[2163]: 2025-01-13 19:42:06 212384 [Warning] Aborted connection 212384 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 14 07:44:42 example.domain mariadbd[2163]: 2025-01-14  7:44:42 223794 [Warning] Aborted connection 223794 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 19 07:28:26 example.domain mariadbd[2163]: 2025-01-19  7:28:26 413273 [Warning] Aborted connection 413273 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 19 07:28:34 example.domain mariadbd[2163]: 2025-01-19  7:28:34 413280 [Warning] Aborted connection 413280 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 24 06:28:30 example.domain mariadbd[2163]: 2025-01-24  6:28:30 585320 [Warning] Aborted connection 585320 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 24 06:28:31 example.domain mariadbd[2163]: 2025-01-24  6:28:31 585351 [Warning] Aborted connection 585351 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 24 19:29:40 example.domain mariadbd[2163]: 2025-01-24 19:29:40 593223 [Warning] Aborted connection 593223 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 25 06:26:08 example.domain mariadbd[2163]: 2025-01-25  6:26:08 603093 [Warning] Aborted connection 603093 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 30 13:25:03 example.domain mariadbd[2163]: 2025-01-30 13:25:03 626770 [Warning] Aborted connection 626770 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error reading communication packets)
Jan 30 13:54:47 example.domain mariadbd[2163]: 2025-01-30 13:54:47 626801 [Warning] Aborted connection 626801 to db: 'unconnected' user: 'netdata' host: 'localhost' (Got an error writing communication packets)
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=54]. You may have to r>
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: Page dump (16384 bytes):
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: 9248670400000036000000000000000000000016bc1b836f0006000000000000
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: 000000000000000000000000000000000000ffffffff0000ffffffff00000000
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: 0000000000021232ffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

There were no power loss as the ordroid is behind an UPS and we had no power outage recently.

Hum, there could be something about filesystem

user@user:~$ sudo btrfs device stats /home
[/dev/sda2].write_io_errs    0
[/dev/sda2].read_io_errs     0
[/dev/sda2].flush_io_errs    0
[/dev/sda2].corruption_errs  0
[/dev/sda2].generation_errs  0
[/dev/sdb2].write_io_errs    0
[/dev/sdb2].read_io_errs     0
[/dev/sdb2].flush_io_errs    0
[/dev/sdb2].corruption_errs  0
[/dev/sdb2].generation_errs  0
user@user:~$ sudo btrfs device stats /
[/dev/sda1].write_io_errs    0
[/dev/sda1].read_io_errs     0
[/dev/sda1].flush_io_errs    0
[/dev/sda1].corruption_errs  106
[/dev/sda1].generation_errs  0
[/dev/sdb1].write_io_errs    0
[/dev/sdb1].read_io_errs     0
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  0
[/dev/sdb1].generation_errs  0

and

sudo btrfsck /dev/sda1 --force --readonly
Opening filesystem to check...
WARNING: filesystem mounted, continuing because of --force
Checking filesystem on /dev/sda1
UUID: 5c830452-c1ea-45ee-91f5-a263d69358f3
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
root 5 inode 90827 errors 200, dir isize wrong
root 5 inode 4020389 errors 1, no inode item
        unresolved ref dir 90827 index 1138 namelen 11 name client.conf filetype 1 errors 5, no dir item, no inode ref
root 5 inode 4020390 errors 1, no inode item
        unresolved ref dir 90827 index 1140 namelen 11 name client.conf filetype 1 errors 5, no dir item, no inode ref
root 5 inode 4020391 errors 1, no inode item
        unresolved ref dir 90827 index 1142 namelen 11 name client.conf filetype 1 errors 5, no dir item, no inode ref
root 5 inode 4020392 errors 1, no inode item
        unresolved ref dir 90827 index 1144 namelen 11 name client.conf filetype 1 errors 5, no dir item, no inode ref
root 5 inode 31000245 errors 200, dir isize wrong
root 5 inode 66568457 errors 1, no inode item
        unresolved ref dir 31000245 index 877 namelen 16 name pg_internal.init filetype 1 errors 5, no dir item, no inode ref
root 5 inode 70354060 errors 200, dir isize wrong
root 5 inode 72942731 errors 1, no inode item
        unresolved ref dir 31000245 index 907 namelen 16 name pg_internal.init filetype 1 errors 5, no dir item, no inode ref
root 5 inode 116149436 errors 1, no inode item
        unresolved ref dir 70354060 index 435 namelen 24 name update-notifier-npm.json filetype 1 errors 5, no dir item, no inode ref
ERROR: errors found in fs roots
found 49928474624 bytes used, error(s) found
total csum bytes: 43716032
total tree bytes: 667533312
total fs tree bytes: 519929856
total extent tree bytes: 81149952
btree space waste bytes: 135047662
file data blocks allocated: 92873883648
 referenced 48728035328

I guess i could retry with read only filesystem when I come back home.

But it seems if I believe netdata that these corruption errors occured when I restarted the server.

Don’t see in logs what could hapened:

Feb 05 05:54:42 example.domain postfix/smtpd[3970303]: warning: hostname 119-40-84-186.bdcom.com does not resolve to address 119.40.84.186: Name or service not known
Feb 05 05:54:42 example.domain postfix/smtpd[3970303]: connect from unknown[119.40.84.186]
Feb 05 05:54:43 example.domain postfix/smtpd[3970303]: disconnect from unknown[119.40.84.186] ehlo=1 auth=0/1 rset=1 quit=1 commands=3/4
Feb 05 05:55:00 example.domain vikunja[3212846]: 2025-02-05T05:55:00+01:00: DEBUG        ▶ 50a3 [Task Reminder Cron] Looking for reminders between 2025-02-05 05:55:00 +0100 CET and 2025-02-05 05:56:00 +0100 CET to>
Feb 05 05:55:00 example.domain vikunja[3212846]: 2025-02-05T05:55:00+01:00: DEBUG        ▶ 50a5 [Task Reminder Cron] Found 0 reminders
Feb 05 05:55:00 example.domain vikunja[3212846]: 2025-02-05T05:55:00+01:00: DEBUG        ▶ 50a9 [Undone Overdue Tasks Reminder] Sending reminders to 0 users
Feb 05 05:55:01 example.domain CRON[3970420]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Feb 05 05:55:01 example.domain CRON[3970421]: pam_unix(cron:session): session opened for user nextcloud(uid=995) by (uid=0)
Feb 05 05:55:01 example.domain CRON[3970423]: (nextcloud) CMD (/usr/bin/php8.3 --define apc.enable_cli=1 -f /var/www/nextcloud/cron.php)
Feb 05 05:55:01 example.domain CRON[3970422]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Feb 05 05:55:01 example.domain CRON[3970420]: pam_unix(cron:session): session closed for user root
Feb 05 05:55:05 example.domain CRON[3970421]: pam_unix(cron:session): session closed for user nextcloud
Feb 05 05:55:10 example.domain influxd-systemd-start.sh[2411]: [httpd] ::1 - - [05/Feb/2025:05:55:10 +0100] "POST /query?db=opentsdb&epoch=ms HTTP/1.1 {'q': 'SELECT mean("value") FROM "netdata.sensors.sensor_chip_>
Feb 05 05:55:38 example.domain systemd[1]: Started nextcloudcron.service - Nextcloud cron.php job.
Feb 05 05:55:42 example.domain systemd[1]: nextcloudcron.service: Deactivated successfully.
Feb 05 05:55:42 example.domain systemd[1]: nextcloudcron.service: Consumed 2.202s CPU time.
Feb 05 05:56:00 example.domain vikunja[3212846]: 2025-02-05T05:56:00+01:00: DEBUG        ▶ 50ab [Task Reminder Cron] Looking for reminders between 2025-02-05 05:56:00 +0100 CET and 2025-02-05 05:57:00 +0100 CET to>
Feb 05 05:56:00 example.domain vikunja[3212846]: 2025-02-05T05:56:00+01:00: DEBUG        ▶ 50ad [Undone Overdue Tasks Reminder] Sending reminders to 0 users
Feb 05 05:56:00 example.domain vikunja[3212846]: 2025-02-05T05:56:00+01:00: DEBUG        ▶ 50b2 [Task Reminder Cron] Found 0 reminders
Feb 05 05:56:10 example.domain influxd-systemd-start.sh[2411]: [httpd] ::1 - - [05/Feb/2025:05:56:10 +0100] "POST /query?db=opentsdb&epoch=ms HTTP/1.1 {'q': 'SELECT mean("value") FROM "netdata.sensors.sensor_chip_>
Feb 05 05:56:23 example.domain systemd[1]: Started ynh-vpnclient-checker.service - YunoHost VPN Client Checker..
Feb 05 05:56:23 example.domain ynh-vpnclient-checker.sh[3970810]: [INFO] Service is already running
Feb 05 05:56:23 example.domain systemd[1]: ynh-vpnclient-checker.service: Deactivated successfully.
Feb 05 05:57:00 example.domain vikunja[3212846]: 2025-02-05T05:57:00+01:00: DEBUG        ▶ 50b3 [Task Reminder Cron] Looking for reminders between 2025-02-05 05:57:00 +0100 CET and 2025-02-05 05:58:00 +0100 CET to>
Feb 05 05:57:00 example.domain vikunja[3212846]: 2025-02-05T05:57:00+01:00: DEBUG        ▶ 50b6 [Task Reminder Cron] Found 0 reminders
Feb 05 05:57:00 example.domain vikunja[3212846]: 2025-02-05T05:57:00+01:00: DEBUG        ▶ 50b8 [Undone Overdue Tasks Reminder] Sending reminders to 0 users
Feb 05 05:57:10 example.domain influxd-systemd-start.sh[2411]: [httpd] ::1 - - [05/Feb/2025:05:57:10 +0100] "POST /query?db=opentsdb&epoch=ms HTTP/1.1 {'q': 'SELECT mean("value") FROM "netdata.sensors.sensor_chip_>
Feb 05 05:58:00 example.domain vikunja[3212846]: 2025-02-05T05:58:00+01:00: DEBUG        ▶ 50bb [Task Reminder Cron] Looking for reminders between 2025-02-05 05:58:00 +0100 CET and 2025-02-05 05:59:00 +0100 CET to>
Feb 05 05:58:00 example.domain vikunja[3212846]: 2025-02-05T05:58:00+01:00: DEBUG        ▶ 50bd [Task Reminder Cron] Found 0 reminders
Feb 05 05:58:00 example.domain vikunja[3212846]: 2025-02-05T05:58:00+01:00: DEBUG        ▶ 50c1 [Undone Overdue Tasks Reminder] Sending reminders to 0 users
Feb 05 05:58:03 example.domain postfix/anvil[3970327]: statistics: max connection rate 1/60s for (smtp:119.40.84.186) at Feb  5 05:54:42
Feb 05 05:58:03 example.domain postfix/anvil[3970327]: statistics: max connection count 1 for (smtp:119.40.84.186) at Feb  5 05:54:42
Feb 05 05:58:03 example.domain postfix/anvil[3970327]: statistics: max cache size 1 at Feb  5 05:54:42
Feb 05 05:58:10 example.domain influxd-systemd-start.sh[2411]: [httpd] ::1 - - [05/Feb/2025:05:58:10 +0100] "POST /query?db=opentsdb&epoch=ms HTTP/1.1 {'q': 'SELECT mean("value") FROM "netdata.sensors.sensor_chip_>
Feb 05 05:58:48 example.domain mautrix-discord[1166219]: 2025-02-05T05:58:48+01:00 WRN wsapi.go:785:onEvent() > unknown event: Op: 0, Seq: 56033, Type: PASSIVE_UPDATE_V2, Data: {"updated_voice_states":[],"updated_>
Feb 05 05:59:00 example.domain vikunja[3212846]: 2025-02-05T05:59:00+01:00: DEBUG        ▶ 50c3 [Task Reminder Cron] Looking for reminders between 2025-02-05 05:59:00 +0100 CET and 2025-02-05 06:00:00 +0100 CET to>
Feb 05 05:59:00 example.domain vikunja[3212846]: 2025-02-05T05:59:00+01:00: DEBUG        ▶ 50c5 [Task Reminder Cron] Found 0 reminders
Feb 05 05:59:00 example.domain vikunja[3212846]: 2025-02-05T05:59:00+01:00: DEBUG        ▶ 50c7 [Undone Overdue Tasks Reminder] Sending reminders to 0 users
Feb 05 05:59:10 example.domain influxd-systemd-start.sh[2411]: [httpd] ::1 - - [05/Feb/2025:05:59:10 +0100] "POST /query?db=opentsdb&epoch=ms HTTP/1.1 {'q': 'SELECT mean("value") FROM "netdata.sensors.sensor_chip_>
Feb 05 06:00:00 example.domain vikunja[3212846]: 2025-02-05T06:00:00+01:00: DEBUG        ▶ 50cb [Task Reminder Cron] Looking for reminders between 2025-02-05 06:00:00 +0100 CET and 2025-02-05 06:01:00 +0100 CET to>
Feb 05 06:00:00 example.domain vikunja[3212846]: 2025-02-05T06:00:00+01:00: DEBUG        ▶ 50cf [Task Reminder Cron] Found 0 reminders
Feb 05 06:00:00 example.domain vikunja[3212846]: 2025-02-05T06:00:00+01:00: DEBUG        ▶ 50d3 [Undone Overdue Tasks Reminder] Sending reminders to 0 users
Feb 05 06:00:01 example.domain CRON[3971803]: pam_unix(cron:session): session opened for user nextcloud(uid=995) by (uid=0)
Feb 05 06:00:01 example.domain CRON[3971804]: (nextcloud) CMD (/usr/bin/php8.3 --define apc.enable_cli=1 -f /var/www/nextcloud/cron.php)
Feb 05 06:00:10 example.domain influxd-systemd-start.sh[2411]: [httpd] ::1 - - [05/Feb/2025:06:00:10 +0100] "POST /query?db=opentsdb&epoch=ms HTTP/1.1 {'q': 'SELECT mean("value") FROM "netdata.sensors.sensor_chip_>
Feb 05 06:00:33 example.domain postfix/smtpd[3971952]: warning: hostname scan-12.security.ipip.net does not resolve to address 172.105.197.151: Name or service not known
Feb 05 06:00:33 example.domain postfix/smtpd[3971952]: connect from unknown[172.105.197.151]
Feb 05 06:00:33 example.domain postfix/smtpd[3971952]: lost connection after CONNECT from unknown[172.105.197.151]
Feb 05 06:00:33 example.domain postfix/smtpd[3971952]: disconnect from unknown[172.105.197.151] commands=0/0
Feb 05 06:00:38 example.domain systemd[1]: Started nextcloudcron.service - Nextcloud cron.php job.
Feb 05 06:01:00 example.domain vikunja[3212846]: 2025-02-05T06:01:00+01:00: DEBUG        ▶ 50d8 [Task Reminder Cron] Looking for reminders between 2025-02-05 06:01:00 +0100 CET and 2025-02-05 06:02:00 +0100 CET to>
Feb 05 06:01:00 example.domain vikunja[3212846]: 2025-02-05T06:01:00+01:00: DEBUG        ▶ 50dc [Undone Overdue Tasks Reminder] Sending reminders to 0 users
Feb 05 06:01:00 example.domain vikunja[3212846]: 2025-02-05T06:01:00+01:00: DEBUG        ▶ 50df [Task Reminder Cron] Found 0 reminders
Feb 05 06:01:10 example.domain influxd-systemd-start.sh[2411]: [httpd] ::1 - - [05/Feb/2025:06:01:10 +0100] "POST /query?db=opentsdb&epoch=ms HTTP/1.1 {'q': 'SELECT mean("value") FROM "netdata.sensors.sensor_chip_>
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=54]. You may have to r>
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: Page dump (16384 bytes):
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: 9248670400000036000000000000000000000016bc1b836f0006000000000000
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: 000000000000000000000000000000000000ffffffff0000ffffffff00000000
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: 0000000000021232ffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Feb 05 06:01:10 example.domain mariadbd[2163]: 2025-02-05  6:01:10 0 [Note] InnoDB: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

I know I’m often short on memory, as the board only has 4 Gb RAM.
But I don’t see aything out of usual in netdata memory graph

How could I stress test RAM to see if there is a problem with it ?

quiwy · February 7, 2025, 4:31pm

Ok, so I used stress-ng to stress test ram for 6 hours, no problem so far.

Did a copy of /var/lib/mysql just in case.

I did a scrub on the filesystem, no error, so reset the btrfs corruption counter.

I restarted mariadb with innodb_force_recovery=5 as suggested by mariadb, and exported all possible databases. Then dropped the corrupted databases.

Removed files causing problems in /var/lib/mysql, restarted mariadb removing innodb_force_recovery=5 first and recreated mysql needed tables with mysql_upgrade -u root -p --force, and reimported databases I wanted, and then imported nextcloud app backup.

Everything is working for now, don’t have errors any more.
Will have a close look at it for the next weeks to see if everything still running fine .

wbk · February 7, 2025, 6:08pm

Great

Thanks for the insightful troubleshooting post!

Only the orange bit is actually “problematic” if it gets too high. The blue amount (cached) is a “useful” variant of the green amount: it is not in actual use by the system but kept at hand in memory.

It could be worthwhile to create a swap file of (half) a gig or so, for example by:

fallocate -l 512M /swapfile && chmod 600 /swapfile && mkswap /swapfile && swapon /swapfile

quiwy · February 7, 2025, 6:17pm

Well, I remember at the beginning of SSD it was not recommended to have swap on it because of limited write cycles. As I was not sure about it when I installed the server back in 2020 I disabled it.

But maybe now it’s not a problem anymore ?

wbk · February 7, 2025, 10:08pm

That is correct; in the mean time, most SSDs are smart enough to spread the writes around to unused/less used flash cells and so limit wear.

smartctl -a /dev/sda lets you check the estimated wear of your SSDs. I have a relatively cheap Samsung 860 QVO model that has been running about as long as yours (2019) that has seen hammering from Proxmox (and Yunohost, among others, but Proxmox is hard on SSDs). It is now at about 350TB written (out of guaranteed 360 TB)

Have a look at the numbers for your SSDs. Linux prefers not to use swap, so if you have enough memory, it won’t be written to all the time. Place it outside of your RAID1 if you have space available there (so there is no need to write that data twice).

There may even be a benefit in artificially increasing the write count to one of your two SSDs in your RAID1 (assuming identical models): if RAID1 implies equal writes to both devices, and both devices have the same life span, one would fail within a short period of the other. By having swap on one of the two, that one should fail as an early warning, giving you time to mirror to a new device

quiwy · February 9, 2025, 9:37am

Hmmm ok I see, it’s smart .

I don’t really know how to read the smart report. Could you point me what values are important ?

smartctl 7.3 2022-02-28 r5338 [aarch64-linux-6.1.0-odroid-arm64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SanDisk SDSSDH3 1T00
Serial Number:    2032CA802276
LU WWN Device Id: 5 001b44 8bb9f90a1
Firmware Version: 415000RL
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        Not in smartctl database 7.3/5319
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Feb  9 10:23:11 2025 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   ---    Old_age   Always       -       33566
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       333
165 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       2204238098951
166 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       67
167 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       45
168 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       148
169 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       352
170 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       0
171 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       0
173 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       96
174 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       246
184 End-to-End_Error        0x0032   100   100   ---    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   ---    Old_age   Always       -       93
194 Temperature_Celsius     0x0022   063   054   ---    Old_age   Always       -       37 (Min/Max 17/54)
199 UDMA_CRC_Error_Count    0x0032   100   100   ---    Old_age   Always       -       0
230 Unknown_SSD_Attribute   0x0032   025   025   ---    Old_age   Always       -       27749938633021
232 Available_Reservd_Space 0x0033   100   100   004    Pre-fail  Always       -       100
233 Media_Wearout_Indicator 0x0032   100   100   ---    Old_age   Always       -       98640
234 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       190372
241 Total_LBAs_Written      0x0030   253   253   ---    Old_age   Offline      -       135241
242 Total_LBAs_Read         0x0030   253   253   ---    Old_age   Offline      -       123598
244 Unknown_Attribute       0x0032   000   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       739         -

Selective Self-tests/Logging not supported

The only things I were looking were if the test passed, and Reallocated_Sector_Ct. I get what Power_On_Hours and Power_Cycle_Count are, they are quite explicit, but I have no idea for the other one.

I asked chatgpt to explain these 2 lines :

241 Total_LBAs_Written      0x0030   253   253   ---    Old_age   Offline      -       135241
242 Total_LBAs_Read         0x0030   253   253   ---    Old_age   Offline      -       123598

ANd as the values it told me where very strange, I used https://dannyda.com/2020/09/06/how-to-convert-smart-attribute-241-total_lbas_written-to-mb-gb-tb-pb/ too, but it seems to me that it’s nonsense. I don’t see how it’s possible to have only 66Mb which where written on the disk, as the disk contains at leats right now almost 50Gb on / and 550 on /home…

So if you have any clue on how to read this smart report, It would be a great help

system · March 11, 2025, 9:38am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.