The Weighted Query algorithm's initial query does not ultimately lead to opinions on how posts are ordered within that set. The ordering is effectively "default ordering".
But there is space for us to order this final data, and this week's work focused on that.
Because we are testing something pretty new, we decided to try a variety of ordering strategies rather than hone in too much. This week we wound up testing eight different versions of "final order".
We tested the "original" variant with 51% of results, and the other 7 with 7 each (
7x7=49). Spending 49% of the yield here was based on the confidence that this was very much low hanging fruit, so we would not be hurting existing members by throwing them at untested versions. We had good confidence that most of these would be an improvement.
Since we had a lot of variables at play, we are going to display the data by ranking for each experiment. We first note where the incumbent finished, and then what the 3rd, 2nd and first place finishers were for each experiment.
Here are the eight ordering possibilities:
|Scenario||Incumbent ranking||3rd Place||2nd Place||Likely Winner|
|Creates a comment.||7th||incumbent||random_weighted_to_score||last_comment_at|
|Creates comments on at least 4 different days within a week.||3rd||random_weighted_to_score||random_weighted_to_comment_score||random_weighted_to_last_comment_at|
|Views pages on at least 4 different days within a week.||6th||score||random||random_weighted_to_score|
|Views pages on at least 4 different hours within a day.||7th||random||last_comment_at||random_weighted_to_score|
|Views pages on at least 9 different days within 2 weeks.||6th||score||random_weighted_to_comment_score||random_weighted_to_score|
|Views pages on at least 12 different hours within five days.||3rd||incumbent||score||random_weighted_to_score|
The clear winner here is
random_weighted_to_score. This was an ordering algorithm which shuffled the results (such that the same post does not stick to the top all day)... But weighted the results approximately by their "score".
Score is a calculated feed which mostly correlates to total positive reaction minus any negative moderator actions.
This gives us evidence that the most relevant signal to noise scenario here is one where we let votes of popularity indicate which posts should float to the top most often. Importantly, our weighted query algorithm does not directly use score — so this value is merely for ordering among what we initially reported.
This was a predictable outcome: Basically top posts should show up near the top of the feed on average if we want people to come back to a useful destination. But if we blindly keep the top post there for too long, it will not be as useful.
It is worth noting that
random_weighted_to_last_comment_at had a bit of downtime due to not appropriately handling edge cases — and it still finished well in most tests. The clear winner was still clearly
random_weighted_to_score, even if we believe that
...last_comment_at was penalized, but this should inform some future tests which might build on this initial "low hanging fruit" outcome.
I am very excited to declare a winner here, and have the results impact all users. I think this was a big step in the process. The next proposed test is an adjustment in how we assign tag preference weights.
I believe these tests are picking up steam, and we will be able to eye some refactors which make them more flexible for all Forems. DEV is the only Forem big enough to confidently glean information on these A/B tests, but these results will begin clearly impacting all Forems for the better.
Top comments (1)
Great news Ben! Thanks for the constant updates on this