Forem Creators and Builders

loading...
Cover image for తెలుగు language

తెలుగు language

9comindia
・1 min read

URL is omitting the non-english title:

If the title of a post is not in english, the url generated for that post is omitting the non-english part in that title.

I don't think it's a technical limitation but an intended design.

Could someone please clarify why the forem is not inclusive for all languages (just kidding)? thanks.

Discussion (9)

Collapse
djuber profile image
Daniel Uber

I don't think it's a technical limitation but an intended design.

I can't answer questions about intent, but I can share the mechanism that's doing this.

The call to String#parameterize (which calls another parameterize method in active support's inflector module) in Article#title_to_slug is removing any characters that aren't transliteratable to ascii (latin based scripts have accented characters replaced with the base character, possibly cyrillic and greek scripts do the same, many Asian languages have their words completely removed during this process).

It's even more dramatic if the article title were only తెలుగు, you'd end up with -79o instead of language-79o as the post's url slug.

Collapse
9comindia profile image
9comindia Author • Edited

thank you @djuber .

Found at apidock.com/rails/v6.1.3.1/ActiveS...

If the optional parameter locale is specified, the word will be parameterized as a word of that language.

Does that mean.. we can add the optional locale parameter while calling the "parameterize" method and get the non-ascii url slug for the post title..

Collapse
djuber profile image
Daniel Uber • Edited

Maybe? It doesn't look to be automatic in the way the documentation suggests, and the choice of locales is limited by the backend (my machine is using an I18n::Backends::Simple backend and loading locale data from translations in gems):

I18n.locale
=> :en

# try the easiest thing that could possibly work, ask for Telugu as a locale:
 I18n.locale = :te
I18n::InvalidLocale: :te is not a valid locale
from /home/djuber/src/forem/vendor/bundle/ruby/3.0.0/gems/i18n-1.8.10/lib/i18n.rb:343:in `enforce_available_locales!'
Enter fullscreen mode Exit fullscreen mode

I don't see any Indian languages (on my machine, via I18n.available_locales) - however this is coming from the translations of installed gems, and I'm not certain whether and how the note about using the locale applies:

[33] pry(main)> I18n.locale=:ua
=> :ua                         

# English still works with the locale set:
[34] pry(main)> "Any thing you like.".parameterize
=> "any-thing-you-like"

# Ukranian doesn't work, even with Ukrainian locale:                                              
[36] pry(main)> "Пароль успішно встановлено. Ви успішно увійшли.".parameterize
=> ""     
Enter fullscreen mode Exit fullscreen mode

I think there's not a magic bullet here, but if you were to experiment with the slug format, you might try reimplementing paramterize without the removal of non-ascii characters, keeping the whitespace treatments, and CGI.escape the result to get a string like

"%E0%B0%A4%E0%B1%86%E0%B0%B2%E0%B1%81%E0%B0%97%E0%B1%81-language-790" which would display correctly as long as your browser had script support.

That might be outside of the short term extensions or have side effects I'm not aware of, but that may work if you tried changing the code in a local instance.

Thread Thread
9comindia profile image
9comindia Author

hi @djuber

from: github.com/sporkmonger/addressable

uri = Addressable::URI.parse("http://www.詹姆斯.com/")
uri.normalize
#=> #<Addressable::URI:0xc9a4c8 URI:http://www.xn--8ws00zhy3a.com/>
Enter fullscreen mode Exit fullscreen mode

Would the Addressable gem be relevant here, instead of calling the parameterize method for the title_to_slug..

Thread Thread
djuber profile image
Daniel Uber • Edited

I think Addressable is doing something very different than what you want?

Is xn--8ws00zhy3a here related at all to the input 詹姆斯, or more legible? It looks like this is a way to encode for transfer (as ascii), rather than to encode for human visibility. I might be misunderstanding it, particularly if there's some automatic display translation in the browser, and while it does preserve the information from the title it doesn't look like it does that in any human meaningful way.

If you wanted to try that - for title strings you probably just want the to_ascii method and not the URI.parse (you're not building a URI without making even larger changes)

Addressable::IDNA.to_ascii('తెలుగు language')
=> "xn-- language-zhy2gyguhb5e" 
Enter fullscreen mode Exit fullscreen mode

You could try it out, but I don't think this fits what you're asking about, a url slug that contains the post title. You would still need to handle stripping the spaces, looking at that output probably before applying punycode translation.

Thread Thread
9comindia profile image
9comindia Author • Edited

Is xn--8ws00zhy3a here related at all to the input 詹姆斯, or more legible?

Not just legible, If I click on the link xn--8ws00zhy3a.com , the link opened has translated to the Chinese characters in the url
chinese url

May not be the exact thing I was initially mentioning, but looking for something that could improve the SEO for non-english Forem instances.

Addressable::IDNA.to_ascii('తెలుగు language')
=> "xn-- language-zhy2gyguhb5e"

This is not resolving to the telugu characters in the url.
telugu url expected

Collapse
9comindia profile image
9comindia Author • Edited

from: github.com/sporkmonger/addressable

uri = Addressable::URI.parse("http://www.詹姆斯.com/")
uri.normalize
#=> #<Addressable::URI:0xc9a4c8 URI:http://www.xn--8ws00zhy3a.com/>
Enter fullscreen mode Exit fullscreen mode

We seem to be already using the Addressable gem in Forem, but missed or removed the above kind of implementation.

After posting this, I went through the previous articles and came to know that Forem had the multi-language functionality earlier but deprecated.

I could understand the logic behind that decision, but enabling this url part now.. would help the non-english forems for a better SEO.

Request you to please give it a thought, thanks.

Collapse
coffeecraftcode profile image
Christina Gorton

Hey @9comindia . Thanks for bringing this up.
i18n work is something that has been on hold in the past for various reasons but we are slowly starting to work on again.
I agree this isn't ideal and we are working on a plan to make it better.
For example this PR that was just added to Forem to help with non-ascii strings in tags.

Collapse
akhil profile image
Akhil Naidu

This comment was out of context, but I'm eager to share this.

My mother tongue was Telugu <3