Blog

Long-form analysis. Open-source code walkthroughs, recommendation-system internals, and the tools we build on.

Series

X For You algorithm, line by line

22 parts

  1. 01

    X For You algorithm, line by line — Part 1: Architecture & the candidate-pipeline framework

    Deep dive into xai-org/x-algorithm. Part 1 walks the system architecture and every line of the candidate-pipeline Rust crate (1,031 LOC, 10 files) — Source, Hydrator, Filter, Scorer, Selector, SideEffect, and the master orchestrator.

  2. 02

    X For You algorithm, line by line — Part 2: Thunder I (post store)

    Part 2 of the deep dive into xai-org/x-algorithm. Thunder's binary startup, Kafka bootstrap, Thrift/proto deserializers, and the in-memory PostStore — DashMap layout, TinyPost vs LightPost, retention trimming, tombstones.

  3. 03

    X For You algorithm, line by line — Part 3: Thunder II (Kafka + gRPC)

    Part 3 of the deep dive into xai-org/x-algorithm. Thunder's two Kafka listeners (legacy Thrift transformer + v2 proto sink), the catchup signal, and the InNetworkPostsService gRPC handler — semaphore-bounded backpressure, two-stage statistics, spawn_blocking for in-memory lookups.

  4. 04

    X For You algorithm, line by line — Part 4: Home-Mixer core + models

    Part 4 of the deep dive into xai-org/x-algorithm. Home-Mixer's binary entry point, QueryBuilder, gRPC service implementations (Scored Posts + For You + URT variant), and the type system — the 50-field ScoredPostsQuery and 30-field PostCandidate plus brand safety verdict computation.

  5. 05

    X For You algorithm, line by line — Part 5: Home-Mixer filters

    Part 5 of the deep dive into xai-org/x-algorithm. All 18 filters in home-mixer/filters/ — duplicates, age, self-tweet, retweet dedup, ineligible subscription, three flavors of seen-post filtering (bloom + impressed + served), muted keyword tokenizer matching, 6-relationship socialgraph filter, conversation dedup, and the 571-LOC topic taxonomy filter.

  6. 06

    X For You algorithm, line by line — Part 6: Concrete candidate pipelines

    Part 6 of the deep dive into xai-org/x-algorithm. The two pipeline declarations that wire every stage component: ForYouCandidatePipeline (5 sources + blender + 8 side effects) and PhoenixCandidatePipeline (15 query hydrators, 6 sources, 10 hydrators, 14 filters, 3 scorers, 6 post-selection hydrators, 3 post-selection filters, 6 side effects). Plus orphan-file analysis.

  7. 07

    X For You algorithm, line by line — Part 7: Candidate hydrators (part 1 of 2)

    Part 7 of the deep dive into xai-org/x-algorithm. The structural candidate hydrators: in_network, core_data, subscription, gizmoduck (composite cache key), blocked_by (non-cached), has_media (shadow-only), language_code, video_duration. The CachedHydrator pattern via Moka cache, three-way result matching, and the has_cached_posts gate.

  8. 08

    X For You algorithm, line by line — Part 8: Candidate hydrators (part 2 of 2)

    Part 8 of the deep dive into xai-org/x-algorithm. The semantic candidate hydrators: engagement counts with tweet-age-based TTLs, two-arm A/B-tested brand safety, the packed-bitset tweet_type_metrics, quote-tweet expansion with parallel I/O, two-safety-level visibility filtering, MinHash Jaccard similarity, topic taxonomy lookups, and the facepile.

  9. 09

    X For You algorithm, line by line — Part 9: Query hydrators

    Part 9 of the deep dive into xai-org/x-algorithm. The 16 query hydrators that populate ScoredPostsQuery: 4 social-graph fetchers, cached-posts Redis lookup, MinHash signature, two parallel UAS-aggregation calls, served history with fatigue logic, IP geolocation, demographics, tiered gender prediction, Grok-topics bitmaps. Plus the orphan pre-refactor files.

  10. 10

    X For You algorithm, line by line — Part 10: Scorers + Selectors

    Part 10 of the deep dive into xai-org/x-algorithm. PhoenixScorer with new-user cluster routing + egress fallback, the 290-line RankingScorer consolidating 22 feature-switch-driven weights + author-diversity exponential decay + tri-branched OON downweighting, VMRanker with DPP diversity, plus the BlenderSelector that interleaves posts/ads/prompts/who-to-follow/push-to-home.

  11. 11

    X For You algorithm, line by line — Part 11: Sources

    Part 11 of the deep dive into xai-org/x-algorithm. All 11 source implementations: Thunder for in-network, TweetMixer for legacy OON, three Phoenix retrieval variants (default, topics, MoE), CachedPostsSource bypass, and the For You-specific sources (ScoredPostsSource, AdsSource, WhoToFollowSource, PromptsSource, PushToHomeSource). Cluster resolution, dedup strategy, graceful degradation.

  12. 12

    X For You algorithm, line by line — Part 12: Side effects (part 1)

    Part 12 of the deep dive into xai-org/x-algorithm. First half of the side-effect stage: MutualFollow stats, served-history truncation, past-request-timestamps write, Kafka impressions publish, Redis post-candidate cache with zstd, cross-DC Phoenix request cache, ads-injection logging, response-stats counters, 5%-sampled reranking Kafka publish.

  13. 13

    X For You algorithm, line by line — Part 13: Side effects (part 2)

    Part 13 of the deep dive into xai-org/x-algorithm. The five heaviest side-effects: shadow-mode multi-cluster Phoenix experiments, served-candidates Kafka publish, the multi-entry served-history Manhattan write, score-distribution + retrieval-position analytics, and the 302-line client-events firehose with cross-product event generation.

  14. 14

    X For You algorithm, line by line — Part 14: Ad blending

    Part 14 of the deep dive into xai-org/x-algorithm. The home-mixer/ads/ module: SafeGapAdsBlender (preserves organic order, fills gaps), PartitionOrganicAdsBlender (sandwich pattern with brand-safety partitioning), spacing inference from ad-service positions, three-rule adjacency enforcement (BSR / handle / keyword). Last session before we leave Rust for Python Phoenix.

  15. 15

    X For You algorithm, line by line — Part 15: Phoenix models (the ML core)

    Part 15 of the deep dive into xai-org/x-algorithm. The actual neural networks: PhoenixModel (ranking transformer with user+history+candidates in one sequence, candidate isolation, multi-action heads) and PhoenixRetrievalModel (two-tower with transformer user encoder + MLP candidate tower, L2-normalized for ANN search). Hash embeddings, multi-hot action projection, continuous MLPs, post-age bucketing.

  16. 16

    X For You algorithm, line by line — Part 16: Phoenix runners + end-to-end pipeline

    Part 16 of the deep dive into xai-org/x-algorithm. The Python runner infrastructure: ModelRunner / RetrievalModelRunner with Haiku transform setup, checkpoint loading from .npz, the unified embedding table layout, three apply functions for retrieval, and run_pipeline.py — the headline release addition that runs retrieval → ranking from exported checkpoints.

  17. 17

    X For You algorithm, line by line — Part 17: Grok transformer + tests

    Part 17 — the final Phoenix session. The Grok-1-derived transformer that powers both ranking and retrieval: candidate isolation attention mask, right-anchored RoPE positions, GQA with tanh-clamping, GeGLU feed-forward, the double-layer-norm DecoderLayer, plus the test suites that pin down the most subtle pieces.

  18. 18

    X For You algorithm, line by line — Part 18: Grox core (dispatcher, engine, generators)

    Part 18 — Grox is the LLM-driven content-understanding pipeline that produces safety labels, content categories, and multimodal embeddings consumed by the rest of the system. Three Python processes (main / dispatcher / engine) cooperate through a multiprocessing Manager. We walk through main.py, engine.py, dispatcher.py, the schedule context, and all 16 Kafka task generators.

  19. 19

    X For You algorithm, line by line — Part 19: Grox plans + data loaders

    Part 19 — Grox's plan layer is a dependency-DAG executor: each plan declares tasks and a dependency map, asyncio futures wire them up, PlanMaster runs all 9 plans in parallel per task. Data loaders cover Kafka (streaming, with prefetch + thread-pool Thrift decode), Strato (on-demand RPC), and a separate-process ASR pipeline with ffmpeg + multimodal LLM.

  20. 20

    X For You algorithm, line by line — Part 20: Grox embedder, summarizer, classifiers

    Part 20 — the ML layer of Grox. Two multimodal embedders (V2 with 5 client choices, V5 with single HTTP path), the post summarizer, and six LLM-call classifiers (spam, banger, post-safety, two-stage PTOS, reply ranker). All share the same system-prompt + User+Post + assistant-slot + parse-JSON pattern, with model-tier escalation (mini → primary → primary-critical → EAPI 4.2) for the highest-stakes calls.

  21. 21

    X For You algorithm, line by line — Part 21: Grox tasks part 1 (base + filters + classifier wrappers)

    Part 21 — the task layer where plan declarations meet concrete service calls. Base Task classes with retry/skip semantics, env-based disable rules, eligibility filters (spam-vs-reply-ranking follower-bucket split), TTL-cache rate limiters, media+ASR hydration, post reload with not-found retry, and six classifier-wrapper tasks plus the moderation-action trigger.

  22. 22

    X For You algorithm, line by line — Part 22 (finale): Grox publishing layer + series wrap-up

    The final part. The publishing layer that writes everything to Manhattan and Kafka: V2/V3/V4/V5 embedder tasks, embedding Kafka publishers, the big task_pub.py kitchen sink, the five embedding sink variants, and the safety annotations sink with bool-metadata derivation and safemodel defense-in-depth. Plus a complete series wrap-up: 22 sessions, 24,914 LOC, Rust + Python, from candidate-pipeline to home-mixer to Phoenix to Grok.

Articles

  1. How to actually get more reach on X — the playbook from 24,914 lines of leaked source code

    We just spent 22 articles walking through every line of X's leaked For You recommendation system. This is the practical playbook that fell out of it: which signals the banger classifier scores you on, which booleans kill your reach, why replies to big accounts get a different scoring path than replies to small ones, and which features the reply ranker logs about every single one of your replies.