Forem Creators and Builders

Discussion on: Changelog: Better Support for Your Language in Tags

Collapse
ellativity profile image
Ella Ang (she/her/elle)

Paging @9comindia and @yheuhtozr because you've both made some really great suggestions regarding i18n in the past.

We're making some small steps, and we hope you'll find them helpful.

Collapse
yheuhtozr profile image
yheuhtozr • Edited

Thank you for letting me know! This seems a great step towards a multilingual site. I'll definitely take time testing in this weekend, but with a quick glance, the current implementation ([[:alnum:]]) looks reasonably solid 👍, except needing some nitpick-level adjustments to cope with the real world:

unicode.org/reports/tr31/#Specific...

  • add U+00B7: multiple Iberian languages require it for some words
  • add U+05F3, U+05F4: Hebrew requires it for some words
  • add U+0F0B: no Tibetan multi-syllable word can be spelled without this
  • add U+200C, U+200D: most Indian & some Arabic languages require it for some words
  • add U+30FB: Japanese requires it for some words
  • some languages need hyphens and apostrophes as a part of spelling, but I have no idea how much their speakers think it legible without those signs
On the other hand, despite OP saying "[w]e still won't permit emoji or symbols in tag names, only words and letters", I can come up with some emojis which escape it (it could be tricky, so leave it up to you whether to fix this hole).
/[[:alnum:]]+/ === "0️⃣" # true
Enter fullscreen mode Exit fullscreen mode

Sorry that this test sample is invalid.

Collapse
djuber profile image
Daniel Uber Author • Edited

Thanks for testing this out, and the additional information about missing character support - I knew the "middle dot" is used in Catalan to separate some double consonants (like in "col·laborar"), and I wouldn't be surprised if there were other Iberian languages using it in a similar manner.

Initially I was trying to test support the same way you demonstrated and misled myself.

/[[:alnum:]]/ === "Test™"
=> true
/\A[[:alnum:]]+\z/ === "Test™"
=> false
Enter fullscreen mode Exit fullscreen mode

We do currently exclude "col·laborar" as a tag name (tested and verified your recommended joiner/modifier character support is lacking). That doesn't look like it's either necessary or intentional, I'll move that into another issue to add that.

I would generally prefer to err on the side of safely accepting too much than to be more restrictive than necessary, I'm not sure whether it's important to restrict the keycaps or variant modifiers you showed in the "key 0" example.

Thread Thread
yheuhtozr profile image
yheuhtozr • Edited

Thank you for the extensive test. At least AFAIK, it'll be sufficient if middle dots can come anywhere other than word-initial for European languages.

Unfortunately I think your test cases do not cover the whole strings (they only return true by picking up any alnum inside):
/\A[[:alnum:]]+\z/ === "This\u200dis a string col·laborar" # false
/\A[[:alnum:]]+\z/ === "0\u200d\u200c\u0f0b\u30fb\u00b7·\u05f3" # false
Enter fullscreen mode Exit fullscreen mode

And sorry for a late edit in my previous comment:

some languages need hyphens and apostrophes as a part of spelling, but I have no idea how much their speakers think it legible without those signs

It includes Irish and Hokkien. What do you think about it?

Thread Thread
djuber profile image
Daniel Uber Author

Yes - thanks for keeping me honest (my initial tests were invalid - I've since edited the reply).

github.com/forem/forem/issues/14745 I opened an issue to clarify the changes expected - unless we encounter technical reasons to prohibit those characters there shouldn't be a reason not to extend the valid set of characters.

I think enforcing the "medial joins must be between characters" rule is probably harder to do strictly than it's worth to get right. I imagine permitting "joiner" characters anywhere in an an otherwise valid string will be permissive, slightly wrong, but also removes the blocking validation from the :alnum: class.

Regarding hyphens - I feel like there's probably some overlap between spelling requirements (English also has words which should be spelled with hyphens, like brother-in-law or twenty-two), and an assumption that tags are "simple" and not composed of phrases. I acknowledge the validity of the language support requirement, I think I'll need to discuss internally with our product team to determine how much that opens the possibility of "really-long-tags-made-of-full-sentences" in a way that's not desired. I don't have a personal opinion about that, but it might be a surprising or unwanted change to an existing and intentional restriction.