As Forem grows and adapts to support different communities, we will have to shift towards building a platform that is inclusive to people of different backgrounds, particularly folks who speak different languages.
To that end, something we've been exploring from a technical standpoint is internationalization (often referred to in the community as i18n
). There are many technical challenges that come with supporting different languages—caching and SEO being two of the biggest problems—and various different approaches that we could take.
So, I'm curious: what kinds of features would you like to see in terms of supporting the internationalization of Forem? Since we're still figuring out how and what to prioritize, I want to hear your thoughts!
Top comments (9)
Moving an internal comment here, which might lack some context, but for discussion, makes sense to chat here.
I think we should start with the "shell" i18n concerns, and not seek to do any automatic content translation— Mostly because I think that's a task pretty decently handled by the browser. If we wanted an optional "translate all content" or even just "translate this content" button that would be more an end-user thing than a site structure thing.
So if we started with "shell" i18n, added the browser preference header as a variant in all cache keys, I think that's the place to go. This at least gives us the option to let forem creators start thinking in terms of a default language other than english.
In terms of the language subfolders and SEO.... We could make
dev.to/es/
... aways use the spanish shell, and only display spanish articles. Root dev.to/ can show any language, but the shell defaults to the forem admin's settings (english in our case).Any individual article would be accessible at its usual path
dev.to/ben/...
but if it's a spanish post, the canonical url could bedev.to/es/ben/....
which would then also use the spanish shell, for the robots to use.I don't think we need this entirely figured out right now, but if we're going to use language subdirectories we need to reserve those directory words right now so that forems don't start filling up with usernames and orgs with those names. I think the first step should be getting the i18n shell MVP worked out. I think most of the Rails i18n functionality should be usable to us, as long as we make it compatible with how we're reading headers for caching and stuff.
On additional aspect, related to subdirs.
Let's keep in mind that locales are comprised of two parts: country code and language code. A Spanish article written by a person from Colombia is not going to be exactly the same as if that person were from Spain could write and viceversa. French in Canada is not exactly the same French for France or Senegal. So we might want to take countries into account as well.
Not sure if we should differentiate those or not, maybe not from the start, but we should take into account that the full locale is
de-AT
,de-CH
,de-DE
, not justde
and some products simply read what the browser locale and "route" the user to the strings translated in the specific locale.A possibility would be to have
it-it
from the start in the URL and serve the same Italian to bothit-IT
andit-CH
so that when we'll have a reason to translate toit-CH
that won't change.I guess this decision has to be made for the caching keys as well ahead of time
I think if country and sub-region is something we want to support, it's common to do something like
/us/en
(Ikea is one example). We definitely should make a call on this upfront.It seems a lot of the "large" companies go as far as changing the domain itself (i.e.
adidas.de
,airbnb.de
,airbnb.com.co
). That approach has advantages, including SEO and code implementation. That will be much tougher for Forem since we're talking about infinite communities and domains to manage.@ben that will be a stretch at the moment. We can do that with Fastly, Nginx won't have the same functionality set up to do that approach yet (using keys, vary headers to lookup tags/keys in cache, etc.). Nginx only has the power to handle paths. Are you suggesting we wait until we get there with Nginx? That would be a future cycle to get Nginx to that level.
If not, it's easier with regards to caching, to not mess with vary and cache keys. By using just the subfolder approach, /es/, Nginx/Lua can re-write a request at the edge based on settings (cookie or headers). This approach also avoids having to do a separate implementation for Fastly.
I'll have to look more into service workers and our setup. Won't doing something like this change that whole approach since the shell content would be dynamic with i18n? The content could change anytime based on settings and not just on deploys. Don't we bust that service worker cache on deploys by Heroku slug?
I have no idea what are the issues behind caching and SEO.
So, the first "feature" I'd like to see is abstraction of all user facing strings and connecting that to an online system that provides the UI translators need to do a proper translation. I'm not sure what Rails uses/requires but you probably have a gettext centered process that does all the extraction from the source files.
Once the strings are extracted, they are fed to hungry volunteer translators on online systems.
A typical exemple from the i18n/l10n world is Weblate (among others) where for ex po4a is handled:
hosted.weblate.org/projects/po4a/
(weblate also hosts the LibreOffice localization)
The underlying architecture is probably handled by Rails, but since some mention locale names down in the thread I think it would be nice not to tinker too much with what Rails offers, all while trying to follow the W3C recommendations. Rails has been around long enough that they're probably not doing anything weird there.
I'd like to see right-to-left text support!
Ahhh, me too! This seems like a great thing to work towards
Apart from the most obvious i18n feature, the UI that is translated to other languages, I'd like to be able for users to set up language-specific version of their names, like Facebook does. This is only makes sense when a user can switch the UI language on the fly, so that someone can use the UI in English, whereas another user in Spanish. The English user will see my name as "Serge", whereas the user that uses the Spanish UI, sees "Sergio".
For the IQA being able to easily access english and a user's native language is important. A lot of the sport analysis and think pieces are led and written by english speakers, but smaller teams/organizations will communicate in their native language. What might be helpful is an ability to filter by language. I guess this could be done with tags? But that might be messy in the long run.