Ramblings of an aging IT geek
← Ramblings of an aging IT geek
homelab

nextcloud, for the third time, finally done properly

A rebuild of my self-hosted Nextcloud done with the things I got wrong the previous two times fixed: a real database, a proper reverse proxy, object storage for data, and backups I have actually tested restoring.

A server rack in a homelab

This is the third time I have stood up Nextcloud at home, and it is the first time I would be willing to put real data on it without flinching. The previous two attempts were not failures exactly, they worked, they synced photos and held documents, right up until they didn't. The first one died because I ran it on SQLite and a sync storm corrupted the database. The second one died because I never tested a restore, discovered my backups were of the wrong directory, and lost a month of changes. So this time I wrote down what went wrong before and built against the list.

The headline lesson from two rebuilds: Nextcloud is not really one thing. It is a PHP application, a database, a web server, a data store, a cron worker, and a caching layer, and almost every problem I have ever had was at the seam between two of those rather than in Nextcloud itself.

the database, properly

No more SQLite. SQLite is fine for kicking the tyres and catastrophic the moment two clients sync hard at the same time, which is precisely when you have real data worth keeping. This rebuild runs on MariaDB with the settings the documentation actually asks for and that everyone, me included, skips the first time. The big one is the transaction isolation level and READ-COMMITTED, plus a four-byte character set so emoji in filenames do not blow up.

[mysqld]
transaction_isolation = READ-COMMITTED
character_set_server = utf8mb4
collation_server = utf8mb4_general_ci
innodb_file_per_table = 1
innodb_buffer_pool_size = 1G

Giving the buffer pool a sensible amount of RAM took the file-listing pages from sluggish to instant. The default is tiny and assumes you have nothing better to do than wait.

redis, because the warnings were right

Every previous install nagged me in the admin panel about missing memory caching and file locking, and every previous time I ignored it because everything seemed to work. It seemed to work because I was the only user. Add a second device syncing aggressively and you get transactional locking errors, files stuck in a "scanning" state, the lot. The fix is Redis doing both the local cache and the distributed file locking.

'memcache.local' => '\OC\Memcache\APCu',
'memcache.locking' => '\OC\Memcache\Redis',
'memcache.distributed' => '\OC\Memcache\Redis',
'redis' => [
    'host' => '127.0.0.1',
    'port' => 6379,
],

This is the single change that took the install from "works on my laptop" to "I trust it with the family's photos". The locking errors simply stopped.

A homelab shelf with network gear and a small server

the reverse proxy and the cron worker

I run everything behind a reverse proxy now, with the TLS terminated there and a real certificate rather than the self-signed thing I used to click through. The two settings that always bite are the upload size and the headers. If large uploads fail at exactly the same point every time, it is the proxy's body-size limit, not Nextcloud. And you must forward the protocol header or Nextcloud generates http:// links behind your https:// proxy and the web UI quietly half-breaks.

The other thing I had wrong both previous times was the background jobs. Nextcloud defaults to running them via AJAX, meaning they only fire when someone has a browser tab open. Housekeeping, expiring shares, generating previews, all of it just stops when nobody is looking, then runs in a great lump and times out. Switch it to system cron and let the worker do its job every five minutes:

*/5 * * * * php -f /var/www/nextcloud/cron.php

Then set the mode to "Cron" in the admin panel so it stops trying to do the work in the browser as well.

A NAS unit with drive bays and status lights

data on object storage

The change I am most pleased with is moving the actual file data off the local disk and onto object storage on my NAS. The application server is now stateless apart from its config: the database holds the metadata, the object store holds the bytes, and if the VM falls over I can rebuild it in minutes and point it back at the same two things. The first two installs put data on the same disk as everything else, which is how I ended up restoring the wrong directory.

backups i have actually restored

This is the part that mattered most, because it is the part I got wrong last time. A backup you have never restored is a hope, not a backup. The routine is: dump the database, snapshot the config, and the object store is already replicated. None of that is interesting. What is interesting is that I then actually restored the lot onto a throwaway VM, logged in, and checked a file came back byte-for-byte.

mysqldump --single-transaction nextcloud | gzip > nextcloud-db.sql.gz
tar czf nextcloud-config.tar.gz /var/www/nextcloud/config

The --single-transaction flag matters: without it a dump taken mid-sync can be internally inconsistent, which is its own special kind of restore-day surprise. I put the whole thing in maintenance mode for the dump anyway, belt and braces, because the one time it bites you it really bites.

So: a real database, Redis doing the locking, a proper proxy, system cron, data on object storage, and a restore I have watched work with my own eyes. None of it is clever. It is just the set of corners I cut twice before, uncut. The thing now does exactly what it should, which after two rebuilds feels almost suspicious, and I keep waiting for it to surprise me. It hasn't yet.