Maybe? It doesn't look to be automatic in the way the documentation suggests, and the choice of locales is limited by the backend (my machine is using an I18n::Backends::Simple backend and loading locale data from translations in gems):
I18n.locale=>:en# try the easiest thing that could possibly work, ask for Telugu as a locale:I18n.locale=:teI18n::InvalidLocale::teisnotavalidlocalefrom/home/djuber/src/forem/vendor/bundle/ruby/3.0.0/gems/i18n-1.8.10/lib/i18n.rb:343:in`enforce_available_locales!'
I don't see any Indian languages (on my machine, via I18n.available_locales) - however this is coming from the translations of installed gems, and I'm not certain whether and how the note about using the locale applies:
[33]pry(main)>I18n.locale=:ua=>:ua# English still works with the locale set:[34]pry(main)>"Any thing you like.".parameterize=>"any-thing-you-like"# Ukranian doesn't work, even with Ukrainian locale: [36]pry(main)>"Пароль успішно встановлено. Ви успішно увійшли.".parameterize=>""
I think there's not a magic bullet here, but if you were to experiment with the slug format, you might try reimplementing paramterize without the removal of non-ascii characters, keeping the whitespace treatments, and CGI.escape the result to get a string like
"%E0%B0%A4%E0%B1%86%E0%B0%B2%E0%B1%81%E0%B0%97%E0%B1%81-language-790" which would display correctly as long as your browser had script support.
That might be outside of the short term extensions or have side effects I'm not aware of, but that may work if you tried changing the code in a local instance.
I think Addressable is doing something very different than what you want?
Is xn--8ws00zhy3a here related at all to the input 詹姆斯, or more legible? It looks like this is a way to encode for transfer (as ascii), rather than to encode for human visibility. I might be misunderstanding it, particularly if there's some automatic display translation in the browser, and while it does preserve the information from the title it doesn't look like it does that in any human meaningful way.
If you wanted to try that - for title strings you probably just want the to_ascii method and not the URI.parse (you're not building a URI without making even larger changes)
You could try it out, but I don't think this fits what you're asking about, a url slug that contains the post title. You would still need to handle stripping the spaces, looking at that output probably before applying punycode translation.
The URL for the article should contain those characters transliterated to ASCII
UI accessibility concerns?
Ideally, nothing should change on this front.
Added/updated tests?
[x] Yes
[ ] No, and this is why: please replace this line with details on why tests
have not been included
[ ] I need help with writing tests
[Forem core team only] How will this change be communicated?
Will this PR introduce a change that impacts Forem members or creators, the
development process, or any of our internal teams? If so, please note how you
will share this change with the people who need to know about it.
[ ] This PR changes the Forem platform and our documentation needs to be
updated. I have filled out the
Changes Requested
issue template so Community Success can help update the Admin Docs
appropriately.
[ ] I've updated the README or added inline documentation
[ ] I will share this change internally with the appropriate teams
[ ] I'm not sure how best to communicate this change and need help
[ ] This change does not need to be communicated, and this is why not: please
replace this line with details on why this change doesn't need to be
shared
</div>
<div class="gh-btn-container"><a class="gh-btn" href="https://github.com/forem/forem/pull/15051">View on GitHub</a></div>
this PR might fix things for cases like yours.
For further actions, you may consider blocking this person and/or reporting abuse
Maybe? It doesn't look to be automatic in the way the documentation suggests, and the choice of locales is limited by the backend (my machine is using an I18n::Backends::Simple backend and loading locale data from translations in gems):
I don't see any Indian languages (on my machine, via
I18n.available_locales
) - however this is coming from the translations of installed gems, and I'm not certain whether and how the note about using the locale applies:I think there's not a magic bullet here, but if you were to experiment with the slug format, you might try reimplementing paramterize without the removal of non-ascii characters, keeping the whitespace treatments, and CGI.escape the result to get a string like
"%E0%B0%A4%E0%B1%86%E0%B0%B2%E0%B1%81%E0%B0%97%E0%B1%81-language-790" which would display correctly as long as your browser had script support.
That might be outside of the short term extensions or have side effects I'm not aware of, but that may work if you tried changing the code in a local instance.
hi @djuber
from: github.com/sporkmonger/addressable
Would the Addressable gem be relevant here, instead of calling the parameterize method for the title_to_slug..
I think Addressable is doing something very different than what you want?
Is
xn--8ws00zhy3a
here related at all to the input詹姆斯
, or more legible? It looks like this is a way to encode for transfer (as ascii), rather than to encode for human visibility. I might be misunderstanding it, particularly if there's some automatic display translation in the browser, and while it does preserve the information from the title it doesn't look like it does that in any human meaningful way.If you wanted to try that - for title strings you probably just want the
to_ascii
method and not the URI.parse (you're not building a URI without making even larger changes)You could try it out, but I don't think this fits what you're asking about, a url slug that contains the post title. You would still need to handle stripping the spaces, looking at that output probably before applying punycode translation.
Not just legible, If I click on the link xn--8ws00zhy3a.com , the link opened has translated to the Chinese characters in the url
May not be the exact thing I was initially mentioning, but looking for something that could improve the SEO for non-english Forem instances.
This is not resolving to the telugu characters in the url.
It looks like the "sterile" library will do transliteration,
Transliterate article titles to slugs #15051
What type of PR is this? (check all applicable)
Description
Article titles written in languages with non-Roman alphabets (Cyrillic, Greek, Hebrew, etc) have the entire title stripped from the slug.
Related Tickets & Documents
QA Instructions, Screenshots, Recordings
UI accessibility concerns?
Ideally, nothing should change on this front.
Added/updated tests?
[Forem core team only] How will this change be communicated?
Will this PR introduce a change that impacts Forem members or creators, the development process, or any of our internal teams? If so, please note how you will share this change with the people who need to know about it.
CHANGELOG.md
this PR might fix things for cases like yours.