Weeknote: 2024-W20
17 June 2024The week started with some loose ends around accessibility, and a collection of small bugfixes and contributions.
Progressive Meshes
There’s a bit of work coming up around making the app more efficient at sending meshes around, and I’d had the idea to use some work I came across while doing my PhD in graphics about 25 years ago; progressive meshes. Time to get the groundwork going on that, there’s a bit of lead time needed.
I got in touch with Hugues Hoppe, the inventor of the method, and asked if anyone had created a transmission format using it, and whether he still thought it was a good idea. He got back to me really quick, which was amazing - Hugues was a big name in the background of my own research, so it was amazing to hear from him! He said that nobody had used it for an actual format, but that it would probably work. So, emboldened, I started drafting.
Initially I thought about extending STL, but STL is a terrible format and so I shouldn’t encourage it more. Besides which, there’s a format already for transmission of mesh data over the Internet; glTF. The right thing to do was to write an extension for that.
So, over a day or two, I learned some glTF and knocked together an extension proposal that adds the various things needed for progressive meshes. I sent it to Hugues once it was done, who validated some of the choices I’d made in writing it, which was again brilliant.
Next step with this is to actually start writing code, getting it into THREE.js and Mittsu so that I can then get Manyfold to stream meshes efficiently over the network. But that’s a bit further down the line, there’s some other stuff I need to get done first.
Usage tracking
After all this theory, I needed a change. The time was coming to apply for more funding, and I thought it would be pretty useful to know how many people are running Manyfold.
The only measure we’ve had up to now is the number of pulls from the docker container registry. This is in the hundreds per day at the moment; release v0.68.0 has been pulled almost 12,000 times in the 19 days it’s been up. That’s obviously a lot, but of course it doesn’t correspond to actual installations. If every installation was pulling once a day to check for updates, it would come out to about 600 installs, but even that seems far too high, honestly.
So, I thought about tracking some usage. How could I find out how many we’ve really got? Well, I’m very privacy-conscious, so the first thing is to realise that we can’t. It’s impossible to know unless we force people to report, and I’m not doing that.
Instead we make it optional, and in order to make sure people feel safe, the data and process has to be as transparent as possible. I’ve seen backlash against open source projects too often to think it’s OK to check the box by default - it has to be properly opt in, and it has to be compliant with all the things like GDPR, etc.
The best way to make sure you’re not leaking any personal information is don’t have any in the first place. So I built a very tiny tracking system that makes a few interesting choices in its design:
-
The data sent is displayed in full in the UI when you turn on the option. You can see everything that’s being sent. The only actual data sent is a stable random ID (generated fresh each time the feature is enabled), and the software version string.
-
You can change the reporting endpoint with an environment variable, to make sure that what we say is sent, is. This is also handy for development and debugging!
-
The tracking API is itself fully open source and extremely minimal. It’s intentionally designed without authentication so that I, as the developer, have no power to avoid poisoning of the data, thus putting the trust in my users. Users can also see exactly what I can; the only way to get data out is though a single endpoint, which displays the aggregate stats.
-
Finally (and this is the bit I really like), it’s really forgetful. All the data is stored in memory in the tracker API. There’s no data to erase, because there’s no permanent storage. This means that the app will forget everything fairly regularly when it restarts, and I physically cannot extract information from it that’s not already there; I can’t change the code without losing all the data! Because the apps report in once a day, this doesn’t matter, the numbers will soon come back.
Maybe some of this is odd, but I like it. And, you can see for yourself a much more real baseline number of running instances. It’s a much much lower number, naturally, but I trust it a lot more and it’ll be enough to show growth trends over time.
There’s one thing I’m not happy with yet - the API host, Render, keeps request logs for 7 days, and that has IP addresses in. I’ve not found a way to make that shorter or turn it off completely yet, but at least it’s already a very short retention period.
All that said, the anonymous tracking really does help me a lot, so please turn it on if you can!
There’s more info on the website on the tracking documentation page. I’m interested in what people think of this, so let me know your thoughts!