Forem Creators and Builders 🌱

Discussion on: Forem: Open for Internationalization

 
djuber profile image
Daniel Uber

I think this is the same problem described here - any non ascii characters (including all Russian words) are omitted when generating url slugs for posts.

A few suggestions were using punycode (via the addressable library, replacing non-ascii text with an encoded representation) or using the unicode escapes to produce ascii urls, the way wikipedia does, with a url like ru.wikipedia.org/wiki/%D0%97%D0%B0... which displays fine in the browser but could seem a little ungainly when written in ascii.

Thread Thread
 
jamie profile image
Jamie Gaskins

@djuber The slug exists for SEO reasons. Going by this table, I'm not sure Punycode would help achieve that goal:

Punycode table from the linked Wikipedia page

@ben @varhal According to the URI spec, URIs can only contain 7-bit ASCII characters. The application strips all non-ASCII characters to avoid %-encoding them as bytes, which would make the URL 2-4x as long and impossible to read, while (from what I can tell) also not contributing to SEO.

As @varhal mentioned elsewhere in this thread, transliteration to Roman characters is a good idea. We might be able to achieve that through this library.

Thread Thread
 
varhal profile image
Varhal • Edited

I support this idea. I hope the developers will listen to it πŸ‘. @ben what do you think about it?

Thread Thread
 
jamie profile image
Jamie Gaskins

@varhal Can you check out this PR?

Thread Thread
 
varhal profile image
Varhal • Edited

Yes, that's what I need.. thanks