Feed test experiment 4 results: "Final ordering" helps close the loop on feed quality.

#feedalgorithm #product #growth

Background

The Weighted Query algorithm's initial query does not ultimately lead to opinions on how posts are ordered within that set. The ordering is effectively "default ordering".

But there is space for us to order this final data, and this week's work focused on that.

Because we are testing something pretty new, we decided to try a variety of ordering strategies rather than hone in too much. This week we wound up testing eight different versions of "final order".

We tested the "original" variant with 51% of results, and the other 7 with 7 each (7x7=49). Spending 49% of the yield here was based on the confidence that this was very much low hanging fruit, so we would not be hurting existing members by throwing them at untested versions. We had good confidence that most of these would be an improvement.

Results

Since we had a lot of variables at play, we are going to display the data by ranking for each experiment. We first note where the incumbent finished, and then what the 3rd, 2nd and first place finishers were for each experiment.

Here are the eight ordering possibilities:

score
comment_score
last_comment_at
random
random_weighted_to_score
random_weighted_to_comment_score
random_weighted_to_last_comment_at

Scenario	Incumbent ranking	3rd Place	2nd Place	Likely Winner
Creates a comment.	7th	incumbent	random_weighted_to_score	last_comment_at
Creates comments on at least 4 different days within a week.	3rd	random_weighted_to_score	random_weighted_to_comment_score	random_weighted_to_last_comment_at
Views pages on at least 4 different days within a week.	6th	score	random	random_weighted_to_score
Views pages on at least 4 different hours within a day.	7th	random	last_comment_at	random_weighted_to_score
Views pages on at least 9 different days within 2 weeks.	6th	score	random_weighted_to_comment_score	random_weighted_to_score
Views pages on at least 12 different hours within five days.	3rd	incumbent	score	random_weighted_to_score

Conjecture

The clear winner here is random_weighted_to_score. This was an ordering algorithm which shuffled the results (such that the same post does not stick to the top all day)... But weighted the results approximately by their "score".

Score is a calculated feed which mostly correlates to total positive reaction minus any negative moderator actions.

This gives us evidence that the most relevant signal to noise scenario here is one where we let votes of popularity indicate which posts should float to the top most often. Importantly, our weighted query algorithm does not directly use score — so this value is merely for ordering among what we initially reported.

This was a predictable outcome: Basically top posts should show up near the top of the feed on average if we want people to come back to a useful destination. But if we blindly keep the top post there for too long, it will not be as useful.

It is worth noting that random_weighted_to_last_comment_at had a bit of downtime due to not appropriately handling edge cases — and it still finished well in most tests. The clear winner was still clearly random_weighted_to_score, even if we believe that ...last_comment_at was penalized, but this should inform some future tests which might build on this initial "low hanging fruit" outcome.

Next steps

I am very excited to declare a winner here, and have the results impact all users. I think this was a big step in the process. The next proposed test is an adjustment in how we assign tag preference weights.

I believe these tests are picking up steam, and we will be able to eye some refactors which make them more flexible for all Forems. DEV is the only Forem big enough to confidently glean information on these A/B tests, but these results will begin clearly impacting all Forems for the better.

Top comments (1)

Lee • Jan 25 '22

Great news Ben! Thanks for the constant updates on this