Today is a big day for Forem Systems. A very big day!
We have announced forem.dev to the world and we get to see our current Forem Systems Stack (FSS) in action with real traffic. Heck yes!
I am pretty excited to see how the system performs and breaks. Yes, I said breaks. We expect this launch to be a bumpy ride to start out β we can't improve our systems without letting it break.
So what is the FSS right now?
Right now, we only have one server configured behind a AWS NLB. Yes, just one server. It is in an AWS Autoscaling Group, so we could scale out quickly by adding in more servers if we wanted. That's how most popular services keep up with large amounts of traffic.
Scaling out is expensive. It also adds complexity into the stack and it's not something that everyone can afford to do when running a smaller community site. We are not designing FSS just for the large high-traffic communities with very large budgets for cloud services. We want FSS to be performant on smaller deployments for the little communities too. That's why we want to see how it does on a smaller, less expensive cloud configuration.
The Server Setup
The server we have right now is a m5a.large
instance type with 2 vCPUs and 8 GiB of RAM. It costs ~$62.78 USD a month and it is running Fedora CoreOS as the Linux distro. All of the services are containerized and running as a non root user via Podman.
The instance is running Traefik on ports 80 and 443 to handle the TLS termination with automattic TLS certs from Let's Encrypt, http to https rewrite, and security headers. Traefik then sends traffic internally to Openresty on port 9090 which is Nginx extended with LuaJIT.
Caching Forem
Openresty acts as a reverse proxy to the Forem backend app which is running on port 3000 and a proxy cache for images and html, but we are intentionally turning the caching layer off. We want to see how things do without it and we can turn it on quickly if things go sideways.
We are also working on some intelligent caching from the app so we can cache and purge based on events happening within the Forem app. We will be using Openresty for running some Lua code for purging cached assets server side. We'll make a new post detailing how that works once it is ready.
Servers: Pets or Cattle?
If you are unfamiliar with the Pets or Cattle phrase, give this a quick read.
The server itself is rather disposable. We are storing all of the important data on AWS managed services which is detailed out below. This approach let's us work on the internal setup and automation of the FSS on the server without worrying about losing data. We can relaunch it with an updated configuration quickly and that lets us iterate towards a better system.
That said, in these early days, weβll still have some downtime. We have not taken any efforts to make this highly available. With the current setup, any kind of maintenance or adjustment will most likely cause downtime. We will figure out how to make that as minimal as possible as we go. Of course, long-term, ensuring that Forem is highly available is a key priority.
We want the FSS to be flexible to let Forem self-hosters run a community as cheaply as possible while keeping the community's data safe. This means blurring the lines between treating your server like a pet vs cattle. Pets are cheap. A ranch full of cattle is not. We want to support the hackers that want to run Forem on their own. Have an old machine in the basement and want to host a small private Forem just for your family? We want to make that possible.
Datastores: PostgreSQL and S3
OK let's talk about datastores. The sources of truth for your community's data.
For the relational database, we are using PostgreSQL 11 on AWS RDS with db.t2.small
which has 1 CPU and 2GB of RAM. It costs ~$24.82 USD a month. I expect we might have to scale this up in the future, but I bet once we get caching sorted out we can keep costs low on RDS.
All images, podcasts, and other uploaded assets are going to be stored in S3 as the source of truth. We are looking into adding in support manipulating these assets (resizing and optimizing) without using a third party service and caching them in Openresty. Once we have that sorted out I am sure you will see a post with the details.
But wait... there is more!
Oh yea, the server is also running Sidekiq with two workers for background job processing, Redis and Elasticsearch too. We could split Redis and Elasticsearch services out onto AWS managed services and move Sidekiq to it's own EC2 instance, but that costs more money and who likes spending more money?
The FSS also has local PostgreSQL 11 service turned off if we wanted to forgo using AWS RDS for the database, but we don't want to lose data if something goes wrong. For all intents and purposes we are designing the FSS so you could run everything on one server if you wanted. Let's call that Forem in a Box. Everything you need to run your own community. I wouldn't do that for a community you really care about just yet though.
Gluing it all together
All of the AWS infrastructure above is managed with Terraform and some Ansible. Almost all of the server configuration is done with Ignition and systemd unit files.
Things that are totally broken or missing right now
To cherry pick some of my cards on my project board. Here is a taste of the things that need work:
- Deployments are manual, clunky, and prone to downtime.
- Initial server provisioning is error prone and not a "push button receive new Forem" experience without some babysitting.
- Logging is less than idea for running many Forems at scale. It doesn't use any FOSS as it uses a third party service that is IMO is very expensive.
- Monitoring is not a thing outside of "uhhhhh I think forem.dev is down" being posted on our Slack.
- No FOSS backup solution outside of paying AWS for S3 storage and RDS backups.
- Caching and image processing is a work in progress and it should be sorted out soon.
This is all OK for now. We are going to make it better and you can follow along on the #systems tag. I will do my best to post updates as I ~break~ improve things.
Top comments (10)
I have a question: why do we need two proxy servers? Could we do everything only with Traefik or only with Openresty?
β€οΈ the local PostgreSQL is an awesome touch! I just had this idea: it could be a local read only replica of the AWS DB if the network is fast enough. Does that make sense?
I sense Hooli and Silicon Valley vibes here :D
I feel like I've been called out here :P
The short answer is that Traefik doesn't have a mature caching solution. They just added caching support six days ago and it doesn't support cache purging.
We could make Nginx do TLS termination, http to https redirection and security headers easily but that means managing the Let's Encrypt certs with something like certbot. Traefik handles the Let's Encrypt cert lifecycle pretty well and it was pretty easy to configure.
A bigger picture answer is I want the FSS to be flexible with each component in the stack. If we were running N number of Forems in a SaaS like setup, it would be ideal to pull Traefik and Nginx (or HAProxy, Varnish or Envoy) off of the FSS and run them as their own cluster above each Forem deployment.
As for a read only PostgreSQL replica... I don't have the mental fortitude for that kind of yak shave just yet. ;)
ahaha that's definitely for the future future. We can make PostgreSQL do magic things with foreign data wrappers
Holy smokes! I was wondering how this would all play out, so for Forem's like mine, no need for Fastly, Cloudinary plus other external services?
Yep! The idea is we want to make Forem run on FOSS as a first class experience and allow for external third party services if needed. Fastly and Cloudinary are fantastic services but not every Forem needs them. Also, not every Forem operator can afford them.
Ooo great so we will be getting documentation on how to spin this up or is it for Forem Cloud?
When it is ready we will open source the FSS and have all the documentation to go with it.
I will be waiting for that day and the sooner the lighter I become π
jdoss++ for not being silly, sending it
Another beer. Another server. Still gonna send it.