Issues with Mysql/InnoDB after Bullseye migration

Lapineige · August 16, 2022, 10:37am

Pour ma part le problème avec la migration Postgresql11→13 a été résolu simplement, en suivant vos conseils : j’ai supprimé les 4 paquets PostgreSQL11 qui gênaient, car aucune application de les utilisait.

C’est passé sans ce soucis cette fois (sur un Raspberry Pi 4).

Par contre le problème de Mysql et Nextcloud reste présent, Nextcloud ne fonctionne plus à cause de cela

Je ne peux pas non plus faire la mise à jour de Nextcloud 22.2.7~ynh1 à 22.2.10~ynh1.

edit:
Un service mysql status donne:

[ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2688, page number=200800], should be [page id: space=2688, page number=3608672]

FreshRSS fonctionne, et utilise aussi MySQL

edit 2 : J’ai essayé de me connecter à la base de donnée avec ynh_mysql_connect_as, ça mouline dans le vide jusqu’au timeout.

edit3: le fichier nextcloud.log ne rapporte rien quand j’essaye de me connecter via mon navigateur (qui donne un 504 Gateway time-out).

edit 4 : après un nouvel essai de redémarrage de php7.4-fpm, maintenant j’ai une Internal Server Error pour Nextcloud, et pour mariadb (mysql) ce message d’erreur:

mariadb.service: Main process exited, code=killed, status=6/ABRT
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: Debian -- User Support
░░
░░ An ExecStart= process belonging to unit mariadb.service has exited.
░░
░░ The process’ exit code is ‘killed’ and its exit status is 6.
░░ systemd[1]: mariadb.service: Failed with result ‘signal’.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: Debian -- User Support
░░
░░ The unit mariadb.service has entered the ‘failed’ state with result ‘signal’.

edit 5 : more detailed logs from journalctl.
A bunch of several times :

août 14 15:40:44 --Thread 2565497728 has waited at row0sel.cc line 4718 for 11.00 seconds the semaphore:
août 14 15:40:44 S-lock on RW-latch at 0x9cb54e8c created in file buf0buf.cc line 1226
août 14 15:40:44 a writer (thread id 2594022272) has reserved it in mode wait exclusive
août 14 15:40:44 number of readers 1, waiters flag 1, lock_word: ffffffff
août 14 15:40:44 Last time write locked in file buf0rea.cc line 120
août 14 15:40:44 InnoDB: Pending reads 0, writes 0

And then

août 14 15:40:44 2022-08-14 15:40:44 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
août 14 15:40:44 220814 15:40:44 [ERROR] mysqld got signal 6 ;
août 14 15:40:44 This could be because you hit a bug. It is also possible that this binary
août 14 15:40:44 or one of the libraries it was linked against is corrupt, improperly built,
août 14 15:40:44 or misconfigured. This error can also be caused by malfunctioning hardware.
août 14 15:40:44 To report this bug, see MariaDB Community Bug Reporting - MariaDB Knowledge Base
août 14 15:40:44 We will try our best to scrape up some info that will hopefully help
août 14 15:40:44 diagnose the problem, but since we have already crashed,
août 14 15:40:44 something is definitely wrong and this may fail.
août 14 15:40:44 Server version: 10.5.15-MariaDB-0+deb11u1

Lapineige · August 16, 2022, 10:37am

Should I try to increase InnoDB timeout settings ? (and some related settings ?)

Lapineige · August 29, 2022, 5:23pm

Update on my issue: server crashed during weekly backup, before of MySQL server error during wallabag backup (but wallabag an freshRSS where working properly using Mysql DB (??) before, while Nextcloud did not).
To elaborate on that: based on archivist email, I can see two errors : first Nextcloud backup fails with Lost connection to MySQL server during query (2013) then Wallabag backup fails with Can't connect to local MySQL server through socket '/run/mysqld/mysqld.sock' (111)" when trying to connect.

Well, actually it didn’t crash: it sent me an email hours after that. I’m still receiving emails from time to time. It even auto update itselfs (I’m receiving emails with new package to update, and they change over time).
But no web app is responding and SSH doesn’t work (kex_exchange_identification: read: Connection reset by peer ). Same as after a reboot.

Can some moderator split my post into another topic ? I would generate less noise here and maybe get more attention on that specific issue. Thanks !

Aleks · August 29, 2022, 6:20pm

Done even though I’m a bit lost in the discussion / not sure to understand what’s broken excatly

Lapineige · August 29, 2022, 8:02pm

You rocks

Well, to summarize: after migration, Mysql server has issues, and gives a timeout.
This will result in a failed reboot, with no ssh access (hence in a dead server).

Nextcloud can’t work because of that, FreshRSS and Wallabag can until you backup them (msqldump fails).

Lapineige · September 9, 2022, 9:41am

After the crash I mentioned (caused by this issue), I restored a backup from before Bulleyes migration.
I will try again and let you know…

I have an hint about that: server IP changed for some unknown reason. That might explain why I had no access.

Lapineige · September 9, 2022, 11:54am

Nextcloud Internal Server error again.

Not sure if I have the same mysql server error, but services mysql status gives:

     Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-09-09 13:50:51 CEST; 32s ago
       Docs: man:mariadbd(8)
             https://mariadb.com/kb/en/library/systemd/
    Process: 8445 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
    Process: 8447 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
    Process: 8449 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
    Process: 8516 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
    Process: 8518 ExecStartPost=/etc/mysql/debian-start (code=exited, status=0/SUCCESS)
   Main PID: 8496 (mariadbd)
     Status: "Taking your SQL requests now..."
      Tasks: 20 (limit: 4915)
     CGroup: /system.slice/mariadb.service
             └─8496 /usr/sbin/mariadbd

mariadbd[8496]: 2022-09-09 13:50:52 0 [ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2688, page number=198698], should be [page id: space=2688, page number=3606570]

(multiple times)

Lapineige · September 9, 2022, 5:34pm

If I try to backup Nextcloud:
mysqldump: Couldn't execute 'SELECT /*!40001 SQL_NO_CACHE */ fileid, storage, path, path_hash, parent, name, mimetype, mimepart, size, mtime, storage_mtime, encrypted, unencrypted_size, etag, permissions, checksumFROMoc_filecache': Lost connection to MySQL server during query (2013)

A FreshRSS backup works. It does a mysqldump too if I’m not wrong…

WTF is happening to that mysql server, that if not responding for one apps but does for the others ??

Lapineige · September 9, 2022, 5:39pm

If I try to stop (then restart) MySQL service :
Stopping MariaDB 10.5.15 database server... sept. 09 19:36:04 yunohost.lapineige.fr mariadbd[27855]: 2022-09-09 19:36:04 0 [Warning] /usr/sbin/mariadbd: Thread 31 (user : 'nextcloud') did not exit

Soooo… Nextcloud user got deleted ???

Should I try to regenerate mysql conf ?

edit : stopping the serviced failed (it timed out). Full log : sept. 09 19:36:04 mariadbd[27855]: 2022-09-09 19:36:04 0 [Warning] /usr/sbin/ma - Pastebin.com
But I can start it ! (Nextcloud still fails)

Lapineige · September 16, 2022, 5:18pm

Any idea ?

huha · September 18, 2022, 4:21pm

Edit: Moved the whole post to here as suggested by @Lapineige. thx!

Lapineige · September 20, 2022, 4:50pm

This looks like a different issue, where your server run out of memory.
See: Mysql service keeps stopping - #2 by Aleks

wbk · September 20, 2022, 6:20pm

Lapineige:

mariadbd[8496]: 2022-09-09 13:50:52 0 [ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2688, page number=198698], should be [page id: space=2688, page number=3606570]

(multiple times)

Here it mentions page number=198698 should be page number=3606570. Is that the same for each log record?

Could some DB’s or tables have been corrupted? If FreshRSS is a backup at filesystem level or block level, it does not know about MySQL transactions (and MySQL does not know about a running backup). That doesn’t help you much further, but could explain why some apps do work, and others don’t.

Lapineige · September 22, 2022, 11:29am

I just had a look, there is a bunch of them:

[ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2688, page number=200810], should be [page id: space=2688, page number=3608682]`
page number=3607443]
[ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2688, page number=200799], should be [page id: space=268
8, page number=3608671]
[ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2688, page number=200800], should be [page id: space=268
8, page number=3608672]
[ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2688, page number=200801], should be [page id: space=268
8, page number=3608673]
[ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2688, page number=200806], should be [page id: space=268
8, page number=3608678]
[ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2688, page number=200808], should be [page id: space=268
8, page number=3608680]
[ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2688, page number=200809], should be [page id: space=268
8, page number=3608681]
[ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2688, page number=200810], should be [page id: space=268
8, page number=3608682]

So it’s not the same one.

Also I noticed something else : If I try to load Nextcloud web page, it takes time without responding, and then Internal server error.
And… in fact Mysql service crash in the mid-time. It restarts with this error (I did not see it in the log before this restart).

Lapineige · September 22, 2022, 12:34pm

I’m making some progress: I searched for that error message, I found mysqlcheck command.
This can check and repair tables.
I did this for owncloud.
It worked… until it failed after the table nextcloud.oc_file_locks

[a whole set of line looking the same of the next 3]
nextcloud.oc_file_locks
note     : Table does not support optimize, doing recreate + analyze instead
status   : OK
mysqlcheck: Got error: 2013: Lost connection to MySQL server during query when executing 'OPTIMIZE TABLE ... '

Listing the table, it seems that the next one is oc_filecache
If I try to “mysqlcheck” it, if fails with the above error mysqlcheck: Got error: Lost connection to MySQL server during query when executing 'OPTIMIZE TABLE ... '.
I wonder what I could do to fix that table, if it fails before starting
Should I drop it ?

Lapineige · September 22, 2022, 1:02pm

DROP TABLE `oc_filecache`;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    30
Current database: nextcloud

Query OK, 0 rows affected (3 min 36.341 sec)

Fail

edit : It dropped it in fact. It does not solve the issue.

Lapineige · September 22, 2022, 2:33pm

I solved it !

Also mysqldump can’t work because of the crashing server.

Honestly I didn’t really know what I where doing, the consequences and so on, but:

Diagnosis : oc_filecache was corrupted in some way.
How I solved it:
- mysql → use nextcloud → DROP oc_filecache to remove that broken table (the content does not really matters in that case)
- recreate the table:

CREATE TABLE oc_filecache (
fileid bigint(20) NOT NULL  AUTO_INCREMENT,
storage bigint(20) NOT NULL DEFAULT 0,
path varchar(4000) DEFAULT NULL,
path_hash varchar(32) NOT NULL,
parent   bigint(20) NOT NULL DEFAULT 0,
name     varchar(250) DEFAULT NULL,
mimetype bigint(20)  NOT NULL      DEFAULT 0,
mimepart bigint(20)  NOT NULL      DEFAULT 0,
size     bigint(20)  NOT NULL DEFAULT 0,
mtime    bigint(20)  NOT NULL DEFAULT 0,
storage_mtime bigint(20) NOT NULL DEFAULT 0,
encrypted  int(11)  NOT NULL  DEFAULT 0,
unencrypted_size bigint(20) NOT NULL DEFAULT 0,
etag     varchar(40) DEFAULT NULL,
permissions int(11) DEFAULT 0,
checksum varchar(255) DEFAULT NULL,
index(storage),
index(parent),
index(size),
index(mtime),
PRIMARY KEY (fileid)
)
ENGINE=INNODB;

How I knew that ? (guessed it in fact…)
I logged in another server with a working Nextcloud. Then in mysql, describe oc_filecache gives all the information about the tables formatting and so on.

Then in /var/www/nextcloud, I ran sudo -u nextcloud php7.4 occ files:cleanup. I don’t really know what it does, but I believe it populates the cache again or something like that.

Then Nextcloud is working.

Lapineige · September 22, 2022, 2:52pm

[ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2688, page number=200806], should be [page id: space=2688, page number=3608678] errors are still here, but well, it works

system · October 22, 2022, 2:53pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.