We are seeking to ensure the Forem feed produces relevant results in a healthy and transparent way. DEV, the biggest Forem, is the only space we are actively testing. It is important to the DEV Community that we get this right because it's what helps separate signal from noise and has traditionally not been the strongest part of the DEV experience.
These are the results of one week of testing. We will always assume we will need two weeks for testing, but as soon as results are clear, we will take that as a win against time and move to the next experiment. There will come a time where we always need at least two weeks for results, but in these "beginner gain" phases, we can wrap things up more quickly.
|Scenario||Incumbent Conversion||Challenger Conversion||Likely Winner||Probability of Winner|
|Creates a comment.||4.6%||4.96%||Challenger||88.07%|
|Creates comments on at least 4 different days within a week.||0.32%||0.23%||Incumbent||85.23%|
|Views pages on at least 4 different days withint a week.||21.84%||23.57%||Challenger||99.73%|
|Views pages on at least 4 different hours within a day.||11.6%||12.49%||Challenger||96.76%|
|Views pages on at least 9 different days within 2 weeks.||9.18%||9.35%||Challenger||66.18%|
|Views pages on at least 12 different hours within five days.||2.04%||2.09%||Challenger||61.12%|
We measure "two weeks back" on a rolling basis, so participants are entered into the challenge with random amounts of back-days. This is why, after one week, we have plenty of winners on "two week challenges". For all intents and purposes, this is fine, as we are still measuring which tweak leads to more conversions.
With that being said, our challenger is mostly a clear winner. Where the incumbent has won is in retention in comment creation — which is very important, but we will seek to massage this weakness in the challenger with the upcoming adjustment.
All else equal, we hope the challenger will win because it represents progress. In this case, the challenger's win will allow us to generally move on from an old algorithm onto something with a lot more long-term promise.
The next test has been proposed and is pending review. It represents a positive refactor towards more use of our newer query approach and hypothesizes that a bit more comment count weight compared to this challenger will help encourage more healthy congregation on the platform.
No single tests are claiming that one thing is the key to a better feed — and this isn't just claiming
more comments == good, but we do need to make ongoing corrections to separate signal from noise within communities.
While moving everyone on to a new approach, it will be important to iterate quickly