I have been using Yunohost for more than 6 years on an OVH kimsufi server.
6 months ago, taking profit of hight speed internet at home, I installed the server at home on a Mini-PC Intel N5100 with 8GB DDR4 RAM with a SSD cruxial BX500 1To for /home and a 128GB M.2 SATA SSD for /
My Yunohost server is up to date (11.2.8.2.)
Friday night, I tried to install ddclient to update automatically IP in case my almost fix IP change, with Debian repository ddclient and following 🚀 Configurer ddclient avec DynDNS Infomaniak - Infomaniak
Saturday morning, my server is out of reach
when I try to connect to any service, I get “500 Internal Server Error - nginx”.
So WebUI and SSH connexion are not working anymore.
When I reboot, I can access through webUI or SSH for 1min-5min and then error again.
I ran diagnostic through webUI when connected that didn’t show any error.
I removed ddclient (sudo apt-get remove ddclient
) but my server still doesn’t work as I face “404 Not Found - nginx”
I connected a screen and keyboard to my home server and I saw on the screen :
systemd-journald[242] failed to rotate /var/log/journal/....
systemd-journald [242] failed to write entry (9 items, 245bytes)..
My /home and my / are used less that 40% so I have plenty of space available.
Then when I restart the server and I check for log with journalctl --verify
then I find :
8bff08: Invalid entry item (12/24 offset: 000000
8bff08: Invalid object contents: Bad message
File corruption detected at [/var/log/journal/1adefe4644714958b95a2bcdfc0a6bfd/system@00060ca1ce98826f-3f42f18a37d54f8f.journal~](mailto:/var/log/journal/1adefe4644714958b95a2bcdfc0a6bfd/system@00060ca1ce98826f-3f42f18a37d54f8f.journal~):8bff08 (of 16777216 bytes, 54%).
FAIL: [/var/log/journal/1adefe4644714958b95a2bcdfc0a6bfd/system@00060ca1ce98826f-3f42f18a37d54f8f.journal~](mailto:/var/log/journal/1adefe4644714958b95a2bcdfc0a6bfd/system@00060ca1ce98826f-3f42f18a37d54f8f.journal~) (Bad message)
I removed the corrupted log file.
then I do sudo systemctl restart systemd-journald
… but the server crash again
I restart, and following @Aleks ’ advices (on Matrix support room), I tried a relatively light backup that failed with:
Erreur: "500"
Action: "POST" /yunohost/api/backups
Retraçage
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/yunohost/[backup.py](http://backup.py)", line 1968, in backup
tar.add(path["source"], arcname=path["dest"])
File "/usr/lib/python3.9/[tarfile.py](http://tarfile.py)", line 1985, in add
File "/usr/lib/python3.9/[tarfile.py](http://tarfile.py)", line 1985, in add
File "/usr/lib/python3.9/[tarfile.py](http://tarfile.py)", line 1985, in add
[Previous line repeated 5 more times]
File "/usr/lib/python3.9/[tarfile.py](http://tarfile.py)", line 1979, in add
File "/usr/lib/python3.9/[tarfile.py](http://tarfile.py)", line 2007, in addfile
File "/usr/lib/python3.9/[tarfile.py](http://tarfile.py)", line 247, in copyfileobj
OSError: [Errno 5] Input/output error
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/yunohost/[log.py](http://log.py)", line 410, in func_wrapper
File "/usr/lib/python3/dist-packages/yunohost/[backup.py](http://backup.py)", line 2283, in backup_create
backup_manager.backup()
File "/usr/lib/python3/dist-packages/yunohost/[backup.py](http://backup.py)", line 772, in backup
method.mount_and_backup()
File "/usr/lib/python3/dist-packages/yunohost/[backup.py](http://backup.py)", line 1705, in mount_and_backup
self.backup()
File "/usr/lib/python3/dist-packages/yunohost/[backup.py](http://backup.py)", line 1979, in backup
raise YunohostError("backup_creation_failed")
yunohost.utils.error.YunohostError: Impossible de créer l'archive de la sauvegarde
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.9/logging/[init.py](http://init.py)", line 1153, in close
File "/usr/lib/python3.9/logging/[init.py](http://init.py)", line 1063, in flush
OSError: [Errno 30] Read-only file system
During handling of the above exception, another exception occurred:
OSError: [Errno 30] Read-only file system
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/moulinette/interfaces/[api.py](http://api.py)", line 453, in process
File "/usr/lib/python3/dist-packages/moulinette/[actionsmap.py](http://actionsmap.py)", line 580, in process
File "/usr/lib/python3/dist-packages/yunohost/[log.py](http://log.py)", line 412, in func_wrapper
File "/usr/lib/python3/dist-packages/yunohost/[log.py](http://log.py)", line 678, in error
File "/usr/lib/python3/dist-packages/yunohost/[log.py](http://log.py)", line 707, in close
File "/usr/lib/python3.9/logging/[init.py](http://init.py)", line 1158, in close
OSError: [Errno 30] Read-only file system
So Aleks supected a hardware problem and advices me to do:
cat /proc/mounts
that gives:
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime,hidepid=invisible 0 0
udev /dev devtmpfs rw,nosuid,relatime,size=3935792k,nr_inodes=983948,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,noexec,relatime,size=790872k,mode=755 0 0
/dev/sda2 / ext4 rw,relatime,errors=remount-ro 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
efivarfs /sys/firmware/efi/efivars efivarfs rw,nosuid,nodev,noexec,relatime 0 0
none /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=10633 0 0
mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0
tracefs /sys/kernel/tracing tracefs rw,nosuid,nodev,noexec,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,nosuid,nodev,noexec,relatime 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0
configfs /sys/kernel/config configfs rw,nosuid,nodev,noexec,relatime 0 0
fusectl /sys/fs/fuse/connections fusectl rw,nosuid,nodev,noexec,relatime 0 0
/dev/sdb1 /home ext4 rw,relatime 0 0
/dev/sda1 /boot/efi vfat rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro 0 0
tmpfs /run/user/73306 tmpfs rw,nosuid,nodev,relatime,size=790868k,nr_inodes=197717,mode=700,uid=73306,gid=73306 0 0
The problem seems to be my system partition move to readonly mode leading to server crash. “The issue is understanding what error is happening exactly triggering the ro mode, though these are usually hardware issues.” Aleks said
So he advices me to look at some tips and further info in 12.10 - Ubuntu goes into read-only mode randomly - Ask Ubuntu and permissions - Filesystem suddenly read-only? - Unix & Linux Stack Exchange
I noticed that the served goes to readonly mode and is inaccessible not only when making a backup, but sometimes around 10-15min after a reboot.
Surprisingly, if I start a file download through Filezilla (SFTP/shh connexion), then even if the server crash and is inaccessible through console/ssh or webUI, the download through Filezilla can continue and complete.
I did test with smartctl
and my 2 SSD seems in good condition without error arising.
I also did sudo fsck -Cy <your partition>
on both partition (system and /home) without any issue being reported.
In addition, my problem is when the readonly mode happens, the screen is filled with journald error (fail to rotate / fail to write) and so I can’t write any command neither do any check.
How could I diagnose what crash my small server?
I could change one of the SSD (event if they are only one year old), but I found no evidence of disk failure so far…
(I hope not to be off topic…)