Forem Creators and Builders 🌱

Cover image for తెలుగు language
9comindia
9comindia

Posted on

తెలుగు language

URL is omitting the non-english title:

If the title of a post is not in english, the url generated for that post is omitting the non-english part in that title.

I don't think it's a technical limitation but an intended design.

Could someone please clarify why the forem is not inclusive for all languages (just kidding)? thanks.

Top comments (10)

Collapse
 
djuber profile image
Daniel Uber

I don't think it's a technical limitation but an intended design.

I can't answer questions about intent, but I can share the mechanism that's doing this.

The call to String#parameterize (which calls another parameterize method in active support's inflector module) in Article#title_to_slug is removing any characters that aren't transliteratable to ascii (latin based scripts have accented characters replaced with the base character, possibly cyrillic and greek scripts do the same, many Asian languages have their words completely removed during this process).

It's even more dramatic if the article title were only తెలుగు, you'd end up with -79o instead of language-79o as the post's url slug.

Collapse
 
9comindia profile image
9comindia Author • Edited on

thank you @djuber .

Found at apidock.com/rails/v6.1.3.1/ActiveS...

If the optional parameter locale is specified, the word will be parameterized as a word of that language.

Does that mean.. we can add the optional locale parameter while calling the "parameterize" method and get the non-ascii url slug for the post title..

Collapse
 
djuber profile image
Daniel Uber • Edited on

Maybe? It doesn't look to be automatic in the way the documentation suggests, and the choice of locales is limited by the backend (my machine is using an I18n::Backends::Simple backend and loading locale data from translations in gems):

I18n.locale
=> :en

# try the easiest thing that could possibly work, ask for Telugu as a locale:
 I18n.locale = :te
I18n::InvalidLocale: :te is not a valid locale
from /home/djuber/src/forem/vendor/bundle/ruby/3.0.0/gems/i18n-1.8.10/lib/i18n.rb:343:in `enforce_available_locales!'
Enter fullscreen mode Exit fullscreen mode

I don't see any Indian languages (on my machine, via I18n.available_locales) - however this is coming from the translations of installed gems, and I'm not certain whether and how the note about using the locale applies:

[33] pry(main)> I18n.locale=:ua
=> :ua                         

# English still works with the locale set:
[34] pry(main)> "Any thing you like.".parameterize
=> "any-thing-you-like"

# Ukranian doesn't work, even with Ukrainian locale:                                              
[36] pry(main)> "Пароль успішно встановлено. Ви успішно увійшли.".parameterize
=> ""     
Enter fullscreen mode Exit fullscreen mode

I think there's not a magic bullet here, but if you were to experiment with the slug format, you might try reimplementing paramterize without the removal of non-ascii characters, keeping the whitespace treatments, and CGI.escape the result to get a string like

"%E0%B0%A4%E0%B1%86%E0%B0%B2%E0%B1%81%E0%B0%97%E0%B1%81-language-790" which would display correctly as long as your browser had script support.

That might be outside of the short term extensions or have side effects I'm not aware of, but that may work if you tried changing the code in a local instance.

Thread Thread
 
9comindia profile image
9comindia Author

hi @djuber

from: github.com/sporkmonger/addressable

uri = Addressable::URI.parse("http://www.詹姆斯.com/")
uri.normalize
#=> #<Addressable::URI:0xc9a4c8 URI:http://www.xn--8ws00zhy3a.com/>
Enter fullscreen mode Exit fullscreen mode

Would the Addressable gem be relevant here, instead of calling the parameterize method for the title_to_slug..

Thread Thread
 
djuber profile image
Daniel Uber • Edited on

I think Addressable is doing something very different than what you want?

Is xn--8ws00zhy3a here related at all to the input 詹姆斯, or more legible? It looks like this is a way to encode for transfer (as ascii), rather than to encode for human visibility. I might be misunderstanding it, particularly if there's some automatic display translation in the browser, and while it does preserve the information from the title it doesn't look like it does that in any human meaningful way.

If you wanted to try that - for title strings you probably just want the to_ascii method and not the URI.parse (you're not building a URI without making even larger changes)

Addressable::IDNA.to_ascii('తెలుగు language')
=> "xn-- language-zhy2gyguhb5e" 
Enter fullscreen mode Exit fullscreen mode

You could try it out, but I don't think this fits what you're asking about, a url slug that contains the post title. You would still need to handle stripping the spaces, looking at that output probably before applying punycode translation.

Thread Thread
 
9comindia profile image
9comindia Author • Edited on

Is xn--8ws00zhy3a here related at all to the input 詹姆斯, or more legible?

Not just legible, If I click on the link xn--8ws00zhy3a.com , the link opened has translated to the Chinese characters in the url
chinese url

May not be the exact thing I was initially mentioning, but looking for something that could improve the SEO for non-english Forem instances.

Addressable::IDNA.to_ascii('తెలుగు language')
=> "xn-- language-zhy2gyguhb5e"

This is not resolving to the telugu characters in the url.
telugu url expected

Thread Thread
 
djuber profile image
Daniel Uber

It looks like the "sterile" library will do transliteration,

Transliterate article titles to slugs #15051

What type of PR is this? (check all applicable)

  • [ ] Refactor
  • [x] Feature
  • [ ] Bug Fix
  • [ ] Optimization
  • [ ] Documentation Update

Description

Article titles written in languages with non-Roman alphabets (Cyrillic, Greek, Hebrew, etc) have the entire title stripped from the slug.

Related Tickets & Documents

QA Instructions, Screenshots, Recordings

  • Publish an article with a title that is completely in a non-Roman alphabet
  • The URL for the article should contain those characters transliterated to ASCII

UI accessibility concerns?

Ideally, nothing should change on this front.

Added/updated tests?

  • [x] Yes
  • [ ] No, and this is why: please replace this line with details on why tests have not been included
  • [ ] I need help with writing tests

[Forem core team only] How will this change be communicated?

Will this PR introduce a change that impacts Forem members or creators, the development process, or any of our internal teams? If so, please note how you will share this change with the people who need to know about it.

  • [ ] I've updated the Developer Docs or Storybook (for Crayons components)
  • [ ] This PR changes the Forem platform and our documentation needs to be updated. I have filled out the Changes Requested issue template so Community Success can help update the Admin Docs appropriately.
  • [ ] I've updated the README or added inline documentation
  • [ ] I've added an entry to CHANGELOG.md
  • [x] I will share this change in a Changelog or in a forem.dev post
  • [ ] I will share this change internally with the appropriate teams
  • [ ] I'm not sure how best to communicate this change and need help
  • [ ] This change does not need to be communicated, and this is why not: please replace this line with details on why this change doesn't need to be shared
</div>
<div class="gh-btn-container"><a class="gh-btn" href="https://github.com/forem/forem/pull/15051">View on GitHub</a></div>
Enter fullscreen mode Exit fullscreen mode


this PR might fix things for cases like yours.
Collapse
 
coffeecraftcode profile image
Christina Gorton

Hey @9comindia . Thanks for bringing this up.
i18n work is something that has been on hold in the past for various reasons but we are slowly starting to work on again.
I agree this isn't ideal and we are working on a plan to make it better.
For example this PR that was just added to Forem to help with non-ascii strings in tags.

Collapse
 
9comindia profile image
9comindia Author • Edited on

from: github.com/sporkmonger/addressable

uri = Addressable::URI.parse("http://www.詹姆斯.com/")
uri.normalize
#=> #<Addressable::URI:0xc9a4c8 URI:http://www.xn--8ws00zhy3a.com/>
Enter fullscreen mode Exit fullscreen mode

We seem to be already using the Addressable gem in Forem, but missed or removed the above kind of implementation.

After posting this, I went through the previous articles and came to know that Forem had the multi-language functionality earlier but deprecated.

I could understand the logic behind that decision, but enabling this url part now.. would help the non-english forems for a better SEO.

Request you to please give it a thought, thanks.

Collapse
 
akhil profile image
Akhil Naidu

This comment was out of context, but I'm eager to share this.

My mother tongue was Telugu <3