Forem Creators and Builders 🌱

Discussion on: Changelog: Better Support for Your Language in Tags

Collapse
 
djuber profile image
Daniel Uber • Edited

Thanks for testing this out, and the additional information about missing character support - I knew the "middle dot" is used in Catalan to separate some double consonants (like in "col·laborar"), and I wouldn't be surprised if there were other Iberian languages using it in a similar manner.

Initially I was trying to test support the same way you demonstrated and misled myself.

/[[:alnum:]]/ === "Testâ„¢"
=> true
/\A[[:alnum:]]+\z/ === "Testâ„¢"
=> false
Enter fullscreen mode Exit fullscreen mode

We do currently exclude "col·laborar" as a tag name (tested and verified your recommended joiner/modifier character support is lacking). That doesn't look like it's either necessary or intentional, I'll move that into another issue to add that.

I would generally prefer to err on the side of safely accepting too much than to be more restrictive than necessary, I'm not sure whether it's important to restrict the keycaps or variant modifiers you showed in the "key 0" example.

Thread Thread
 
yheuhtozr profile image
yheuhtozr • Edited

Thank you for the extensive test. At least AFAIK, it'll be sufficient if middle dots can come anywhere other than word-initial for European languages.

Unfortunately I think your test cases do not cover the whole strings (they only return true by picking up any alnum inside):
/\A[[:alnum:]]+\z/ === "This\u200dis a string col·laborar" # false
/\A[[:alnum:]]+\z/ === "0\u200d\u200c\u0f0b\u30fb\u00b7·\u05f3" # false
Enter fullscreen mode Exit fullscreen mode

And sorry for a late edit in my previous comment:

some languages need hyphens and apostrophes as a part of spelling, but I have no idea how much their speakers think it legible without those signs

It includes Irish and Hokkien. What do you think about it?

Thread Thread
 
djuber profile image
Daniel Uber

Yes - thanks for keeping me honest (my initial tests were invalid - I've since edited the reply).

github.com/forem/forem/issues/14745 I opened an issue to clarify the changes expected - unless we encounter technical reasons to prohibit those characters there shouldn't be a reason not to extend the valid set of characters.

I think enforcing the "medial joins must be between characters" rule is probably harder to do strictly than it's worth to get right. I imagine permitting "joiner" characters anywhere in an an otherwise valid string will be permissive, slightly wrong, but also removes the blocking validation from the :alnum: class.

Regarding hyphens - I feel like there's probably some overlap between spelling requirements (English also has words which should be spelled with hyphens, like brother-in-law or twenty-two), and an assumption that tags are "simple" and not composed of phrases. I acknowledge the validity of the language support requirement, I think I'll need to discuss internally with our product team to determine how much that opens the possibility of "really-long-tags-made-of-full-sentences" in a way that's not desired. I don't have a personal opinion about that, but it might be a surprising or unwanted change to an existing and intentional restriction.