Forem Creators and Builders

Cover image for Diving into Dev's Relevancy Feed Builder
Jeremy Friesen for Forem Core Team

Posted on • Originally published at takeonrules.com on

Diving into Dev's Relevancy Feed Builder

SQL Composition by Configuration

In we added an alternate mechanism for building the relevancy feed for DEV and other instances of the Forem code base. After a bit of configuration, that became the default means of producing the relevancy feed.

Between then and now, we’ve been experimenting with the “levers” of the relevancy feed to try to bring a more relevant feed experience.

On , we merged a significant refinement to how we build the relevancy feed. By design, this new mechanism produces the same query as the prior mechanism.

This new approach, allows us to more easily experiment with the relevancy feed; and capture, record, analyze, and annotate the results of those experiments.

And here’s the best part: because this is open source, you can see what experiments we’re running and how we configure those levers.

Data Modeling Forem’s Feed Builder

Below is a rough sketch of the ERD I would use were we putting all of this in the database.

The conceptual ERD detailed in the PlantUML Textual Version of ERD

PlantUML Textual Version of ERD
@startuml
!theme amiga
entity experiment {
* id
--
* label
* start date
* end date
}

entity variant_experiment {
* experiment_id
--
* variant_id
* probability of using this variant
}

entity variant {
* id
--
* label
* order_by_lever_id
}

entity variant_relevancy_lever {
* variant_id
* relevancy_lever_id
--
* config (e.g. relevancy score range)
}

entity order_by_lever {
* id
--
* order_by_fragment
}

entity relevancy_lever {
* id
--
* label
* user_required
* select_fragment
* joins_fragments
* group_by_fragment
}

experiment ||–|{ variant_experiment
variant ||–o{ variant_experiment
relevancy_lever ||–o{ variant_relevancy_lever
variant ||–o{ variant_relevancy_lever
order_by_lever ||–o{ variant
@enduml
Enter fullscreen mode Exit fullscreen mode

We are not putting this in our database for a few reasons:

  1. The database is more opaque.
  2. The database requires more consideration regarding synchronization across Forem installations.

Instead, we’re putting the above information in our application. This allows us to test each experiment and variant.

It also means that how we build the relevancy feed is transparent. Building from the above diagram we have 6 conceptual entities:

  • experiments
  • experiment_variants
  • variants
  • order_by_levers
  • variant_relevancy_levers
  • relevancy_levers

The experiments and experiment_variants are defined in the config/field_test.yml. The variants are defined in config/feed-variants; each named variant will have a corresponding JSON file in the config/feed-variants directory. Each variant in the config/field-variants directory defines the variant_relevancy_levers and order_by_lever used; see config/feed-variants/original.json for an example. The relevancy_levers and order_by_levers are defined in the Articles::Feeds module.

Assembling Forem’s Relevancy Feed

Below is a conceptual sequence diagram of how we build the relevancy feed:

The conceptual sequence diagram detailed in the PlantUML Textual Version of Conceptual Feed Query Builder Sequence

PlantUML Textual Version of Conceptual Feed Query Builder Sequence
@startuml
!theme amiga
participant “GET relevancy feed” as GetRequest
participant “AbExperiment.get” as AbExperiment
participant “Articles::Feeds::\nVariantQuery.build_for” as VariantQuery
participant “Articles::Feeds::\nVariantAssembler.call” as Assembler
participant “/config/feed-variants/*.json” as VariantConfig
participant “Articles::Feeds\n.lever_catalog” as LeverCatalog
GetRequest –> AbExperiment : with :user
GetRequest <– AbExperiment : :variant
GetRequest –> VariantQuery : with :user, :variant
VariantQuery –> Assembler : with :variant
Assembler –> VariantConfig : with :variant
Assembler <– VariantConfig : :variant_config
Assembler –> LeverCatalog : with :variant_config
Assembler <– LeverCatalog : :query_config
VariantQuery <– Assembler : :query_config
VariantQuery –> Article : with :user, :query_config
GetRequest <– Article : Article::ActiveRecord::Relation
@enduml
Enter fullscreen mode Exit fullscreen mode

When you come to the DEV homepage, we retrieve the your randomly assigned variant. With that variant, we assemble the query configuration and then perform the query to return the articles you see in your relevancy feed.

Show Me the SQL Already

Below is the evaluated SQL of the 2022-04-15 incumbent feed variant. And for those curious, the “challenger” will join the fray on April 25th; once we merge Create new feed-variant 20220422. You can also checkout the inline documentation on configuring the VariantQuery.

And now a wall of SQL

PosgreSQL Select Statement
SELECT
  "articles"."path",
  "articles"."title",
  "articles"."id",
  "articles"."published",
  "articles"."comments_count",
  "articles"."public_reactions_count",
  "articles"."cached_tag_list",
  "articles"."main_image",
  "articles"."main_image_background_hex_color",
  "articles"."updated_at",
  "articles"."slug",
  "articles"."video",
  "articles"."user_id",
  "articles"."organization_id",
  "articles"."video_source_url",
  "articles"."video_code",
  "articles"."video_thumbnail_url",
  "articles"."video_closed_caption_track_url",
  "articles"."experience_level_rating",
  "articles"."experience_level_rating_distribution",
  "articles"."cached_user",
  "articles"."cached_organization",
  "articles"."published_at",
  "articles"."crossposted_at",
  "articles"."description",
  "articles"."reading_time",
  "articles"."video_duration_in_seconds",
  "articles"."last_comment_at"
FROM
  "articles"
  INNER JOIN (
    SELECT
      articles.id,
      (
    (
      CASE (
        current_date - articles.published_at :: date
      ) WHEN 0 THEN 1.0 WHEN 1 THEN 0.99 WHEN 2 THEN 0.985 WHEN 3 THEN 0.98 WHEN 4 THEN 0.975 WHEN 5 THEN 0.97 WHEN 6 THEN 0.965 WHEN 7 THEN 0.96 WHEN 8 THEN 0.955 WHEN 9 THEN 0.95 WHEN 10 THEN 0.945 WHEN 11 THEN 0.94 WHEN 12 THEN 0.935 WHEN 13 THEN 0.93 WHEN 14 THEN 0.925 ELSE 0.9 END
    ) * (
      CASE COUNT(comments_by_followed.id) WHEN 0 THEN 0.95 WHEN 1 THEN 0.98 WHEN 2 THEN 0.99 ELSE 0.93 END
    ) * (
      CASE articles.comments_count WHEN 0 THEN 0.8 WHEN 1 THEN 0.82 WHEN 2 THEN 0.84 WHEN 3 THEN 0.86 WHEN 4 THEN 0.88 WHEN 5 THEN 0.9 WHEN 6 THEN 0.92 WHEN 7 THEN 0.94 WHEN 8 THEN 0.96 WHEN 9 THEN 0.98 ELSE 1.0 END
    ) * (
      CASE (
        CASE articles.featured WHEN true THEN 1 ELSE 0 END
      ) WHEN 1 THEN 1.0 ELSE 0.85 END
    ) * (
      CASE COUNT(followed_user.follower_id) WHEN 0 THEN 0.8 WHEN 1 THEN 1.0 ELSE 1.0 END
    ) * (
      CASE COUNT(followed_org.follower_id) WHEN 0 THEN 0.95 WHEN 1 THEN 1.0 ELSE 1.0 END
    ) * (
      CASE (
        current_date - MAX(comments.created_at):: date
      ) WHEN 0 THEN 1.0 WHEN 1 THEN 0.9988 ELSE 0.988 END
    ) * (
      CASE LEAST(
        10.0,
        SUM(followed_tags.points)
      ):: integer WHEN 0 THEN 0.7 WHEN 1 THEN 0.7303 WHEN 2 THEN 0.7606 WHEN 3 THEN 0.7909 WHEN 4 THEN 0.8212 WHEN 5 THEN 0.8515 WHEN 6 THEN 0.8818 WHEN 7 THEN 0.9121 WHEN 8 THEN 0.9424 WHEN 9 THEN 0.9727 ELSE 1.0 END
    ) * (
      CASE (
        CASE WHEN articles.privileged_users_reaction_points_sum < -10 THEN -1 WHEN articles.privileged_users_reaction_points_sum > 10 THEN 1 ELSE 0 END
      ) WHEN -1 THEN 0.2 WHEN 1 THEN 1.0 ELSE 0.95 END
    ) * (
      CASE articles.public_reactions_count WHEN 0 THEN 0.9988 WHEN 1 THEN 0.9988 WHEN 2 THEN 0.9988 WHEN 3 THEN 0.9988 ELSE 1.0 END
    )
      ) as relevancy_score
    FROM
      articles
      LEFT OUTER JOIN user_blocks ON user_blocks.blocked_id = articles.user_id
      AND user_blocks.blocked_id IS NULL
      AND user_blocks.blocker_id = : user_id
      LEFT OUTER JOIN follows AS followed_user ON articles.user_id = followed_user.followable_id
      AND followed_user.followable_type = 'User'
      AND followed_user.follower_id = : user_id
      AND followed_user.follower_type = 'User'
      LEFT OUTER JOIN comments AS comments_by_followed ON comments_by_followed.commentable_id = articles.id
      AND comments_by_followed.commentable_type = 'Article'
      AND followed_user.followable_id = comments_by_followed.user_id
      AND followed_user.followable_type = 'User'
      AND comments_by_followed.deleted = false
      AND comments_by_followed.created_at > '2022-04-08 12:22:04.501182'
      LEFT OUTER JOIN follows AS followed_org ON articles.organization_id = followed_org.followable_id
      AND followed_org.followable_type = 'Organization'
      AND followed_org.follower_id = : user_id
      AND followed_org.follower_type = 'User'
      LEFT OUTER JOIN comments ON comments.commentable_id = articles.id
      AND comments.commentable_type = 'Article'
      AND comments.deleted = false
      AND comments.created_at > '2022-04-08 12:22:04.501182'
      LEFT OUTER JOIN taggings ON taggings.taggable_id = articles.id
      AND taggable_type = 'Article'
      INNER JOIN tags ON taggings.tag_id = tags.id
      LEFT OUTER JOIN follows AS followed_tags ON tags.id = followed_tags.followable_id
      AND followed_tags.followable_type = 'ActsAsTaggableOn::Tag'
      AND followed_tags.follower_type = 'User'
      AND followed_tags.follower_id = : user_id
      AND followed_tags.explicit_points >= 0
    WHERE
      articles.published = true
      AND articles.published_at > '2022-04-08 12:22:04.501182'
      AND articles.published_at < '2022-04-23 12:22:04.501592'
    GROUP BY
      articles.id,
      articles.published_at,
      articles.comments_count,
      articles.featured,
      articles.privileged_users_reaction_points_sum,
      articles.public_reactions_count
    ORDER BY
      relevancy_score DESC,
      articles.published_at DESC
    LIMIT
      50
  ) AS article_relevancies ON articles.id = article_relevancies.id
ORDER BY
  article_relevancies.relevancy_score DESC,
  articles.published_at DESC
Enter fullscreen mode Exit fullscreen mode

I generated the above query by adding the following to our Rspec tests of each of the variant configs suite:

File.open(Rails.root.join("tmp/#{variant}.sql").to_s, "w+") do |file|
  file.puts query_call.to_sql
end

Enter fullscreen mode Exit fullscreen mode

I also removed the test specific user id, replacing it with :user_id, and tidied up the SQL.

Discussion (0)