URL is omitting the non-english title:
If the title of a post is not in english, the url generated for that post is omitting the non-english part in that title.
I don't think it's a technical limitation but an intended design.
Could someone please clarify why the forem is not inclusive for all languages (just kidding)? thanks.
Top comments (10)
I can't answer questions about intent, but I can share the mechanism that's doing this.
The call to String#parameterize (which calls another
parameterize
method in active support's inflector module) in Article#title_to_slug is removing any characters that aren't transliteratable to ascii (latin based scripts have accented characters replaced with the base character, possibly cyrillic and greek scripts do the same, many Asian languages have their words completely removed during this process).It's even more dramatic if the article title were only తెలుగు, you'd end up with
-79o
instead oflanguage-79o
as the post's url slug.thank you @djuber .
Found at apidock.com/rails/v6.1.3.1/ActiveS...
Does that mean.. we can add the optional locale parameter while calling the "parameterize" method and get the non-ascii url slug for the post title..
Maybe? It doesn't look to be automatic in the way the documentation suggests, and the choice of locales is limited by the backend (my machine is using an I18n::Backends::Simple backend and loading locale data from translations in gems):
I don't see any Indian languages (on my machine, via
I18n.available_locales
) - however this is coming from the translations of installed gems, and I'm not certain whether and how the note about using the locale applies:I think there's not a magic bullet here, but if you were to experiment with the slug format, you might try reimplementing paramterize without the removal of non-ascii characters, keeping the whitespace treatments, and CGI.escape the result to get a string like
"%E0%B0%A4%E0%B1%86%E0%B0%B2%E0%B1%81%E0%B0%97%E0%B1%81-language-790" which would display correctly as long as your browser had script support.
That might be outside of the short term extensions or have side effects I'm not aware of, but that may work if you tried changing the code in a local instance.
hi @djuber
from: github.com/sporkmonger/addressable
Would the Addressable gem be relevant here, instead of calling the parameterize method for the title_to_slug..
I think Addressable is doing something very different than what you want?
Is
xn--8ws00zhy3a
here related at all to the input詹姆斯
, or more legible? It looks like this is a way to encode for transfer (as ascii), rather than to encode for human visibility. I might be misunderstanding it, particularly if there's some automatic display translation in the browser, and while it does preserve the information from the title it doesn't look like it does that in any human meaningful way.If you wanted to try that - for title strings you probably just want the
to_ascii
method and not the URI.parse (you're not building a URI without making even larger changes)You could try it out, but I don't think this fits what you're asking about, a url slug that contains the post title. You would still need to handle stripping the spaces, looking at that output probably before applying punycode translation.
Not just legible, If I click on the link xn--8ws00zhy3a.com , the link opened has translated to the Chinese characters in the url
May not be the exact thing I was initially mentioning, but looking for something that could improve the SEO for non-english Forem instances.
This is not resolving to the telugu characters in the url.
It looks like the "sterile" library will do transliteration,
Transliterate article titles to slugs #15051
What type of PR is this? (check all applicable)
Description
Article titles written in languages with non-Roman alphabets (Cyrillic, Greek, Hebrew, etc) have the entire title stripped from the slug.
Related Tickets & Documents
QA Instructions, Screenshots, Recordings
UI accessibility concerns?
Ideally, nothing should change on this front.
Added/updated tests?
[Forem core team only] How will this change be communicated?
Will this PR introduce a change that impacts Forem members or creators, the development process, or any of our internal teams? If so, please note how you will share this change with the people who need to know about it.
CHANGELOG.md
this PR might fix things for cases like yours.
Hey @9comindia . Thanks for bringing this up.
i18n work is something that has been on hold in the past for various reasons but we are slowly starting to work on again.
I agree this isn't ideal and we are working on a plan to make it better.
For example this PR that was just added to Forem to help with non-ascii strings in tags.
from: github.com/sporkmonger/addressable
We seem to be already using the Addressable gem in Forem, but missed or removed the above kind of implementation.
After posting this, I went through the previous articles and came to know that Forem had the multi-language functionality earlier but deprecated.
I could understand the logic behind that decision, but enabling this url part now.. would help the non-english forems for a better SEO.
Request you to please give it a thought, thanks.
This comment was out of context, but I'm eager to share this.