..And then, November happened..
After the insane month of November 2022, it's now time to write down what happened. A lot happened.
Let's try this Mastodon thing
Back in 2017, I stumbled upon the Mastodon project and thought, this looks nice! Let's try it. So I registered a domain name (mastodon.nu) and installed Mastodon, back then using Yunohost. Started using it, but soon lost interest, so a few months later I shut it down, and in 2018 the domain name expired.
In 2021, I re-gained interest in Mastodon. Twitter became more and more un-useful for me, and I remembered how I liked the Mastodon interface. So, I started looking for a new domain name, and found 'mastodon.world' still available. I was running a dedicated server at Hetzner, on which I ran ProxMox. In ProxMox, I ran a virtual machine on which I ran Docker. Running Mastodon in Docker is easy.
As with most of my self-hosted servers, I opened registrations for Mastodon. In my experience, not many people would signup, which was true for the next months. Around 10 accounts per month were created, but only a few remained active.
Early November, Elon Musk's Twitter-takeover became final. That, and some poor decisions he took, hurting people's confidence in Twitter, caused people to leave Twitter and search for alternatives, like Mastodon. I saw a slight increase in user signups, 'my' 100 users became 140 users. I figured, my server could handle some more, so I thought I would apply for the Serverlist, the list maintained by the Mastodon developers, which is also used for the Mastodon mobile app when selecting a server to join. November 3rd, at 19:02 my time, I mailed them requesting my server to be added. At 21:58, I received a reply bij Eugen that my server was added. But I didn't see any signups coming in. After an hour, I went to bed.
The next morning, around 8AM, I woke up, seeing 700 users in my server! Nice, people started finding my server. I welcomed and helped some newcomers before going to work.
While at work, the registrations started rising even faster. The 1000th user was reached at 10:15, the 5000th at 21:00. The next 45 minutes gave 1200 new registrations, by then I started seeing issues on the server. I closed the registrations and started troubleshooting.
One of the important components of Mastodon is Sidekiq. These are queues of tasks that do all sorts of stuff, like pushing the posts to other servers, getting posts from other servers, sending e-mail, checking links etc. There are different queues (default, pull, push, ingress, mailer, scheduler). The default install in Docker starts 1 process for that, with 5 threads, handling all queues. This means, only 5 of these tasks can be handled simultaneously. When investigating the issues I encountered, I saw over 350000 tasks were queued. Sidekiq couldn't keep up.
Without looking into it very much, I changed the number of threads from 5 to 200. Queues were going down quickly.
I re-opened registrations, the biggest inflow of users was over for now, growth was more low and steady.
Growing pains and making plans
November 5th. User count was around 10k. By now, I couldn't consider Mastodon.world as just some hobby project. Many people were relying on this service. So, first of all, I was keeping a close look on the performance all day (luckily it was weekend). Closing registrations now and then, to give the server room. Then, I started thinking about continuity. It was just me running this. I decided I would need at least 2 other admins, not only to help out, but also to take over in case something happens to me. Asking around, I found 2 people willing to help: @jeroen and @spaceriker.
I also noticed that moderation was slowly taking up more and more of my time. (At the moment, moderation is the most time-consuming element of running Mastodon). So I decided to look for moderators. I found 3, and a day later a fourth. In different timezones, which is helpful:
Because of the sudden surge in users, I also noticed that other, also some larger, instances had closed registrations. So, whenever I opened registrations, big parts of the network's new users landed on my server.
November 6th, I saw a spike in registrations again. Sidekiq queues went up again, so I started tweaking those. I started more and more processes, with around 800 threads in total. But, that triggered the next issue. The database's max connections setting was reached, so sidekiq could't create more threads. I raised the max connections to 800 (as a database admin, I know that's not very good when using Postgres, but all my actions were emergency fixes...) and restarted the database. Pfew. Got that fixed.
But now, many sidekiq tasks started failing. Investigating, I found that it was caused by the e-mail server, which I also hosted myself. All the accounts send out verification e-mails, sign in e-mails, password reset e-mails, and notification e-mails. The mailserver choked on the number of mails. (I had no idea how many, at that moment). After some failed attempts at getting this to work again, I switched to Mailgun, a service provider for sending e-mails. I knew this was going to be costly, but again, it was an emergency fix. This helped. Everything was working fine again. (Later, I would see at the peak day it would send out 75000 e-mails!)
November 7th, I started created donation pages, so that users could make donations if they wish. I had had some request from users wanting to donate, and also I was concerned about the projected cost for that month, with Mailgun. And I wanted options for scaling out, because I was worried the current setup couldn't handle much more.
I created an OpenCollective and Patreon account. Donations started coming in quickly, which creates possibilities for planning scaling.
Also that day, one of the moderators @stevo, who works in 'Business Ethics and Anti-Corruption Compliance', created a new set of rules for the server. Up till then, I had 4 rules, which I made up when there were a handful of users. By now, we needed more, also for being used by the moderators to moderate by.
That evening, I was interviewed by Joanna Stern, a senior Tech columnist for the Wall Street Journal. She happened to have created her Mastodon account on my server. A few days later, this article was published.
Mostly stable.. let's keep it that way!
Growth was kind of stable. With the e-mail service and the tweaked Sidekiq queues, the server was running well. People complimented me on the 'snappiness' of the server. (Some other big servers were still struggling). I was making plans to get better hardware. First, I setup a new instance (tootspace.nl) purely for testing. Same setup as mastodon.world (but smaller). On November 11th, I used this testserver to test migrating the media storage to Wasabi. That was necessary because I noticed this taking up more and more diskspace, and had calculated I could only last a few more days with my current disks. After I got this working on the testserver, I migrated the media for mastodon.world to Wasabi as well. (It was 200GB back then, but grew rapidly. Today, Dec 12th, it takes up 2.2 TB)
November 12th, I ordered a bigger server from Hetzner, the AX161 with a 32-core, 64-thread AMD EPIC CPU and 256GB RAM. It was 'delivered' 2 days later.
On November 13th, @jeroen added uptime monitoring (status.mastodon.world) so we can see (and show) how reliable the server is for the users.
Around that same time, Wasabi started having issues. Users couldn't see or upload images intermittently. I reported this to Wasabi support, it would take a day before they got that fixed.
Also the 13th, @stevo wrote a Code of Conduct, to further guide the moderators and the users in how we want the community to be a happy place for all.
Server migration and upgrade to v4
On November 14th, it was time to move mastodon.world to the new server. This was actually quite easy:
* Install docker on the new server
* Copy the mastodon software and database to the new server using
* Stop Mastodon
* Do a final
rsync to get the latest changes
* Start Mastodon on the new server
* Change the DNS entry to point mastodon.world to the new server
(Left out some smaller steps, but these are the main ones)
This resulted in around 20 minutes downtime, the most of that being the result of a typo I made in the Nginx configuration.
On the new server, Mastodon ran mega fast. Sidekiq tasks are executed very quickly because of the fast CPU, so queues stay low.
The next day, I upgraded mastodon.world to Mastodon v4.0.2, after first testing it on tootspace.nl. The upgrade took less than a minute of downtime. Version 4 was long awaited with some key features (Editing of posts, following hashtags etc).
On the 16th, I did some tweaking on OS-parameters (open files) because Nginx started to give errors.
November 18th started with 30.000 users. But then Elon shook up the Twitter community again. Apparently he fired some staff (a lot actually), and at a certain point even closed Twitter offices so staff couldn't get in anymore. This caused an even larger exodus of Twitter users who started to be concerned that Twitter would fall down. (Which I don't think it will), Thousands of new users registered every hour. Within 12 hours, user count doubled, to 62.000. I then closed registrations because I started seeing some database connection issues again.
So, finally I started to look into the database configuration (I am a database admin, so perhaps should/could have done that sooner :–) ). What I also use at work when dealing with too many connections to a Postgres database, is pgbouncer. So, I installed and configured that on the test server, and after some configuring, got it working. The next day I installed that on mastodon.world. Now that bottleneck was solved, we were ready to accept more registrations.
Because of the growth, we started getting more and more reports on content and users that violates our rules. So, between 18 and 22 November, we added 3 more moderators to the team: * @KTR (for helping moderate Spanish posts) * @Sodfabasha (for helping moderate Arabic posts) * @Paul
On November 21st, we reached 100.000 users. Growth was slowly getting less, system was stable, moderators doing an excellent job!
I was getting more and more e-mails on firstname.lastname@example.org, so I installed a ticketing system, enabling me to have some of the moderators help me reply to the mails.
November 22nd, I was interviewed by The Verge, resulting in this article.
November was over, but on December 3rd, something happened which I would like to add to this blog.
Some users reported seeing 'weird accounts' spamming the Federated timeline of our server.
They were accounts with names like
@<random string>@<random string>.activitypub-troll.cf.
The only thing they did, was post messages with 2 accounts of the same trolls.
When such post arrives at your server (because someone boosts it or follows it), the server will check the post and follow the linked accounts. On each of those linked accounts, again they will find a post with 2 accounts. And so on. On our server, already 20.000 of such accounts were known. Luckily, the server could handle that extra load. Other, large servers couldn't, and were unreachable.
When I found out what was happening, I blocked the activitypub-trolls.cf domain (which took some time because of the number of subdomains known to our server). That solved the issue.
I then posted this information to mastodon, mentioning #mastoadmin, so that other admins could take action. I also notified other admins via Matrix channels. My post was boosted 3700 times, it reached many admins.
This has been a very stressful, but fun, and educating month. Apart from the few bad comments you get, I got many friendly messages and compliments on running this server. I met (mostly online) some new people. I am really blown away by how the moderators are doing their job, like professionals (although this is all volunteer-work!). I have had 3 interviews (last week also for the Süddeutsche Zeitung) and an interview with a student from the University of Alberta.
I did get some remarks from people that think a Mastodon server shouldn't host this many accounts. Although I agree, there was no alternative place for these new users to go to, during the peak moments. Large servers closed registrations or were down because of the influx. Many smaller servers couldn't be found because they are not on the Joinmastodon serverlist (mind you, many new users sign up via the Mastodon app, which uses this serverlist when signing up). This needs attention for the future growth of Mastodon.
There are also some follow-up tasks and questions. How do I want to scale for possible future growth? Do I want growth at all? How do I make sure there is high-availability, without performance being degraded? And so on.
In my next blog, I will talk about the cost of running this server.