journalCTL 100% disk utilization for extended period of time

What type of hardware are you using: Raspberry Pi 3, 4+
What YunoHost version are you running: 12.0.11
How are you able to access your server: The webadmin
SSH
Are you in a special context or did you perform specific tweaking on your YunoHost instance ?: no

Describe your issue

Hello.
Sometimes, my YNH instance becomes rather slow. This happens periodically, and a little inconsistently. Sometimes it is my fault. However, when using HTOP, I noticed an unusual behavior:
Journalctl uses 100% of my disk for several minutes at a time.
for no real reason. Right now, It’s using 100% of my disk at 266MB/s because I wanted to open the services window to view filebrowser’s status. It’s been running for at least 2 minutes. That means this has read 32 GB+ of something. I do not think that I have 32GB of logs. And it’s still running. That’s another 16GB as I wrote that sentence. I have witnessed this run for up to 10 minutes before (<–this is a dubious claim. I am strictly going off of feeling here. at least 5. Longer than 3.). Not always related to filebrowser. Why? Is there a solution? I fear it will wear out my SSD. This seems to happen when I attempt to view a service through the webadmin, but usually it only does it for 20 seconds at most (including most journalctls for filebrowser). Why does it take so long sometimes? I do not have enough information. Do I have 80+ GB of logs to delete?

Share relevant logs or error messages

can I journalctl journalctl? sudo journalctl -u systemd-journald?

-- Boot d447e2a31b2a4166a568e0b39ec639b7 --
Feb 12 15:01:46 host.domain.com systemd-journald[161]: Journal started
Feb 12 15:01:46 host.domain.com systemd-journald[161]: Runtime Journal (/run/log/journal/ce84254c98d941b78d2b5207990fca98) is 8.0M, max 75.8M, 67.8M free.
Feb 12 15:01:46 host.domain.com systemd-journald[161]: Time spent on flushing to /var/log/journal/ce84254c98d941b78d2b5207990fca98 is 47.721ms for 386 entries.
Feb 12 15:01:46 host.domain.com systemd-journald[161]: System Journal (/var/log/journal/ce84254c98d941b78d2b5207990fca98) is 3.9G, max 4.0G, 77.3M free.
Feb 12 15:01:47 host.domain.com systemd-journald[161]: Received client request to flush runtime journal.
Feb 15 08:52:27 host.domain.com systemd-journald[161]: Data hash table of /var/log/journal/ce84254c98d941b78d2b5207990fca98/system.journal has a fill level at 75.0 (174763 of 233016 items, 134217728 file size, 767 bytes per hash table item), suggesting rotation.
Feb 15 08:52:27 host.domain.com systemd-journald[161]: /var/log/journal/ce84254c98d941b78d2b5207990fca98/system.journal: Journal header limits reached or header out-of-date, rotating.
Feb 24 13:05:53 host.domain.com systemd-journald[598014]: File /var/log/journal/ce84254c98d941b78d2b5207990fca98/system.journal corrupted or uncleanly shut down, renaming and replacing.
Feb 24 13:05:53 host.domain.com systemd-journald[598014]: Journal started
Feb 24 13:05:53 host.domain.com systemd-journald[598014]: System Journal (/var/log/journal/ce84254c98d941b78d2b5207990fca98) is 3.9G, max 4.0G, 77.4M free.
Feb 24 13:05:38 host.domain.com systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
Feb 24 13:05:38 host.domain.com systemd[1]: systemd-journald.service: Killing process 161 (systemd-journal) with signal SIGABRT.
Feb 24 13:28:59 host.domain.com systemd-journald[598372]: File /var/log/journal/ce84254c98d941b78d2b5207990fca98/system.journal corrupted or uncleanly shut down, renaming and replacing.
Feb 24 13:28:59 host.domain.com systemd-journald[598372]: Journal started
Feb 24 13:28:59 host.domain.com systemd-journald[598372]: System Journal (/var/log/journal/ce84254c98d941b78d2b5207990fca98) is 3.9G, max 4.0G, 69.4M free.
Feb 24 13:28:56 host.domain.com systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
Feb 24 13:28:56 host.domain.com systemd[1]: systemd-journald.service: Killing process 598014 (systemd-journal) with signal SIGABRT.
Feb 24 14:32:11 host.domain.com systemd-journald[600283]: File /var/log/journal/ce84254c98d941b78d2b5207990fca98/system.journal corrupted or uncleanly shut down, renaming and replacing.
Feb 24 14:32:11 host.domain.com systemd-journald[600283]: Journal started
Feb 24 14:32:11 host.domain.com systemd-journald[600283]: System Journal (/var/log/journal/ce84254c98d941b78d2b5207990fca98) is 3.9G, max 4.0G, 61.4M free.
Feb 24 14:23:23 host.domain.com systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
Feb 24 14:23:23 host.domain.com systemd[1]: systemd-journald.service: Killing process 598372 (systemd-journal) with signal SIGABRT.
Feb 24 15:06:22 host.domain.com systemd-journald[600283]: Forwarding to syslog missed 11 messages.
Feb 24 15:06:32 host.domain.com systemd-journald[600283]: Forwarding to syslog missed 449 messages.
Feb 24 15:08:14 host.domain.com systemd-journald[600283]: Forwarding to syslog missed 9 messages.
Feb 24 15:09:08 host.domain.com systemd-journald[600283]: Forwarding to syslog missed 47990 messages.
Feb 24 15:19:48 host.domain.com systemd-journald[601521]: File /var/log/journal/ce84254c98d941b78d2b5207990fca98/system.journal corrupted or uncleanly shut down, renaming and replacing.
Feb 24 15:27:21 host.domain.com systemd-journald[601629]: File /var/log/journal/ce84254c98d941b78d2b5207990fca98/system.journal corrupted or uncleanly shut down, renaming and replacing.
Feb 24 15:27:21 host.domain.com systemd-journald[601629]: Journal started
Feb 24 15:27:21 host.domain.com systemd-journald[601629]: System Journal (/var/log/journal/ce84254c98d941b78d2b5207990fca98) is 3.9G, max 4.0G, 29.3M free.
Feb 24 15:17:19 host.domain.com systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
Feb 24 15:17:20 host.domain.com systemd[1]: systemd-journald.service: Killing process 600283 (systemd-journal) with signal SIGABRT.

Caught it in the act! Can’t reboot yunohost using the webadmin because “a different yunohost command is running” or some such. Would be cool if it were more specific, but there should be NONE.

Please see the HTOP images.

32 minutes???

200 MiB/s. Over 32 minutes, that’s 384 GiB, right? My disk only holds 512 GiB.

I notice that so many other apps appear to have utilization too. That is strange. They should be at a much lower value than they are (I am really the only person who uses my instance). Could this be related?

Not sure if this is just a thing that runs in the background repeatedly, and that’s why it’s at 32 mins, it has been at least 2 hours since I noticed it, so it surely must be a real issue.

let me know if you need more information.

Hello.

I tried to log into the webadmin and update, and I encountered the same issue again; another yunohost command is running.

I don’t know what to do.

This could be my fault. I have a lot of logs in my syslog facility 10. Not sure why this would cause journalCTL to run for an hour still.

Dec 26 17:15:01 host.domain.com sudo[790886]: pam_unix(sudo:auth): auth could not identify password for [MadMan247]
Dec 26 17:15:01 host.domain.com sudo[790886]: pam_ldap(sudo:auth): failed to get password: Authentication failure
Dec 26 17:15:59 host.domain.com sudo[790952]: pam_unix(sudo:auth): authentication failure; logname=MadMan247 uid=54273 euid=0 tty=/dev/pts/0 ruser=madman2>
Dec 26 17:15:59 host.domain.com sudo[790952]: MadMan247 : TTY=pts/0 ; PWD=/home/MadMan247 ; USER=root ; COMMAND=./pullsite.sh
Dec 26 17:15:59 host.domain.com sudo[790952]: pam_unix(sudo:session): session opened for user root(uid=0) by MadMan247(uid=54273)
Dec 26 17:16:00 host.domain.com sudo[790952]: pam_unix(sudo:session): session closed for user root
Dec 26 17:20:01 host.domain.com sudo[791034]: pam_unix(sudo:auth): conversation failed
Dec 26 17:20:01 host.domain.com sudo[791034]: pam_unix(sudo:auth): auth could not identify password for [MadMan247]
Dec 26 17:20:01 host.domain.com sudo[791034]: pam_ldap(sudo:auth): failed to get password: Authentication failure
Dec 26 17:25:01 host.domain.com sudo[791134]: pam_unix(sudo:auth): conversation failed

What is that?

Try cleaning the logs. Maybe sudo yunohost tools basic-space-cleanup would help.

And if it’s an sdcard, check it for hardware failure.

Thank you for the response Jarod!

pullsite.sh is a script to clone from a git to my_webapp for my website. It is a likely culprit for the high sudo errors given the frequency of the errors. I mistakenly made it run ever 5 mins, which is way too often. I changed it to once ever 12 hours. It does run properly, so I am not sure why I get sudo errors for it. What confuses me is, why would these sudo errors cause the disk usage to be at 100% for such a long time?

I did not know yunohost had a built-in cleanup script that I could use. I’ll run that!

I ran journalctl --verify, and some (but not many) journals are corrupt. I also ran journalctl --vacuum-time=1w and --vacuum-size=500m.

smartmontools told me no old-age or pre-failure. I am using an SSD over USB 3.

I’ll have to monitor for a week to see if this issue creeps up again.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.