Today is a big day for Forem Systems. A very big day!
We have announced forem.dev to the world and we get to see our current Forem Systems Stack (FSS) in action with real traffic. Heck yes!
I am pretty excited to see how the system performs and breaks. Yes, I said breaks. We expect this launch to be a bumpy ride to start out — we can't improve our systems without letting it break.
Right now, we only have one server configured behind a AWS NLB. Yes, just one server. It is in an AWS Autoscaling Group, so we could scale out quickly by adding in more servers if we wanted. That's how most popular services keep up with large amounts of traffic.
Scaling out is expensive. It also adds complexity into the stack and it's not something that everyone can afford to do when running a smaller community site. We are not designing FSS just for the large high-traffic communities with very large budgets for cloud services. We want FSS to be performant on smaller deployments for the little communities too. That's why we want to see how it does on a smaller, less expensive cloud configuration.
The server we have right now is a
m5a.large instance type with 2 vCPUs and 8 GiB of RAM. It costs ~$62.78 USD a month and it is running Fedora CoreOS as the Linux distro. All of the services are containerized and running as a non root user via Podman.
The instance is running Traefik on ports 80 and 443 to handle the TLS termination with automattic TLS certs from Let's Encrypt, http to https rewrite, and security headers. Traefik then sends traffic internally to Openresty on port 9090 which is Nginx extended with LuaJIT.
Openresty acts as a reverse proxy to the Forem backend app which is running on port 3000 and a proxy cache for images and html, but we are intentionally turning the caching layer off. We want to see how things do without it and we can turn it on quickly if things go sideways.
We are also working on some intelligent caching from the app so we can cache and purge based on events happening within the Forem app. We will be using Openresty for running some Lua code for purging cached assets server side. We'll make a new post detailing how that works once it is ready.
If you are unfamiliar with the Pets or Cattle phrase, give this a quick read.
The server itself is rather disposable. We are storing all of the important data on AWS managed services which is detailed out below. This approach let's us work on the internal setup and automation of the FSS on the server without worrying about losing data. We can relaunch it with an updated configuration quickly and that lets us iterate towards a better system.
That said, in these early days, we’ll still have some downtime. We have not taken any efforts to make this highly available. With the current setup, any kind of maintenance or adjustment will most likely cause downtime. We will figure out how to make that as minimal as possible as we go. Of course, long-term, ensuring that Forem is highly available is a key priority.
We want the FSS to be flexible to let Forem self-hosters run a community as cheaply as possible while keeping the community's data safe. This means blurring the lines between treating your server like a pet vs cattle. Pets are cheap. A ranch full of cattle is not. We want to support the hackers that want to run Forem on their own. Have an old machine in the basement and want to host a small private Forem just for your family? We want to make that possible.
OK let's talk about datastores. The sources of truth for your community's data.
For the relational database, we are using PostgreSQL 11 on AWS RDS with
db.t2.small which has 1 CPU and 2GB of RAM. It costs ~$24.82 USD a month. I expect we might have to scale this up in the future, but I bet once we get caching sorted out we can keep costs low on RDS.
All images, podcasts, and other uploaded assets are going to be stored in S3 as the source of truth. We are looking into adding in support manipulating these assets (resizing and optimizing) without using a third party service and caching them in Openresty. Once we have that sorted out I am sure you will see a post with the details.
Oh yea, the server is also running Sidekiq with two workers for background job processing, Redis and Elasticsearch too. We could split Redis and Elasticsearch services out onto AWS managed services and move Sidekiq to it's own EC2 instance, but that costs more money and who likes spending more money?
The FSS also has local PostgreSQL 11 service turned off if we wanted to forgo using AWS RDS for the database, but we don't want to lose data if something goes wrong. For all intents and purposes we are designing the FSS so you could run everything on one server if you wanted. Let's call that Forem in a Box. Everything you need to run your own community. I wouldn't do that for a community you really care about just yet though.
To cherry pick some of my cards on my project board. Here is a taste of the things that need work:
- Deployments are manual, clunky, and prone to downtime.
- Initial server provisioning is error prone and not a "push button receive new Forem" experience without some babysitting.
- Logging is less than idea for running many Forems at scale. It doesn't use any FOSS as it uses a third party service that is IMO is very expensive.
- Monitoring is not a thing outside of "uhhhhh I think forem.dev is down" being posted on our Slack.
- No FOSS backup solution outside of paying AWS for S3 storage and RDS backups.
- Caching and image processing is a work in progress and it should be sorted out soon.
This is all OK for now. We are going to make it better and you can follow along on the #systems tag. I will do my best to post updates as I ~break~ improve things.