Weeknote: 2024-W21

17 June 2024

Week 21, and it was time to bite off the next big bit of work. Improvements to uploading are the most popular requested features at the moment, and in order to do that, I wanted to make some underlying improvements to the way the files are managed. One big driver for this is that I wanted to be able to support cloud storage (like Amazon S3), so that if you’re running a shared instance, you don’t need a real attached disk, and can easily expand storage as required.

Most cloud file storage systems that I looked at included things like direct upload support, so it made sense to do the backend rewrite first, to make upload changes easier.

I started out thinking that ActiveStorage would be the way forward; Rails has this built in, and it supports filesystem, cloud storage, uploading, all that good stuff. But, after investigating for a bit, I found that it wasn’t quite going to work. Rails is really really good at getting you moving quickly in 90% of the cases. Normally when writing an app, you’d know the system you’re deploying, so you can initialise the storage in config files, connect at boot time, etc.

However, Manyfold being self-hosted, I don’t know at coding-time what storage will be in use. It has to be dynamic. And, I realised, it needs to be per-library as well. You could absolutely have a situation with a private local archive, but cloud storage for public stuff. ActiveStorage won’t quite cut it for that.

Looking around a bit more, I came across Shrine, which will do the job instead. I still need to bend it a little to get it to do what I want, but it’s much more flexible than ActiveStorage.

Local filesystem

First problem: Manyfold cares about the local file structure. Most attachment systems don’t - they don’t need to! But Manyfold is built primarily to organise your local disk, so it has to organise it in the way you want. Random directory IDs aren’t going to cut it. Shrine allows this by allowing you to override the “key generation” (i.e. folder naming), so I could make a storage engine that’s aware of what its storing, and will use the model metadata to put things in the right place.

We won’t need to do this for cloud storage, I don’t think, but it might be easy enough that we can anyway.

Different libraries

I was wondering recently if libraries had had their time in Manyfold; with the newer organisation stuff like creators, collections, etc, they didn’t seem that relevant any more. Why would you have more than one? Well, cloud storage has given me my answer. A library represents where the stuff is stored; that could be a local path, or it could be off in the cloud. I updated the code so that each Library has its own storage engine, and the models within it will automatically use it.

Work in progress

Making a change like this is hard; I’m rewriting the fundamental file system behind a load of running systems, and I want it to upgrade seamlessly; if I get this right, nobody will ever notice, but if I get it wrong, I will break so much stuff. So this is still a work in progress, and honestly it’s baking my head a bit. I’ve got the Shrine storage basics working, but the rewriting of all the existing file code is daunting to say the least.

It’s not helped by the fact that all my tests are failing. The tests are predicated on particular files being in particular places, or stubbing out the existing file presence checks. The way Shrine works is completely different, so all my test setup is wrong. I’ve got to fix that before I can move on to actually making the feature code work, and it’s confusing.

It’ll be great though, it’s going to enable a lot of new features as well as just cloud storage, like automatic conversion, image resizing, loads of things. Once I make sure I’ve understood it and feel safe releasing it! Eeeep.