Blog
Long-form analysis. Open-source code walkthroughs, recommendation-system internals, and the tools we build on.
Series
X For You algorithm, line by line
22 parts
- 01
X For You algorithm, line by line — Part 1: Architecture & the candidate-pipeline framework
Deep dive into xai-org/x-algorithm. Part 1 walks the system architecture and every line of the candidate-pipeline Rust crate (1,031 LOC, 10 files) — Source, Hydrator, Filter, Scorer, Selector, SideEffect, and the master orchestrator.
- 02
X For You algorithm, line by line — Part 2: Thunder I (post store)
Part 2 of the deep dive into xai-org/x-algorithm. Thunder's binary startup, Kafka bootstrap, Thrift/proto deserializers, and the in-memory PostStore — DashMap layout, TinyPost vs LightPost, retention trimming, tombstones.
- 03
X For You algorithm, line by line — Part 3: Thunder II (Kafka + gRPC)
Part 3 of the deep dive into xai-org/x-algorithm. Thunder's two Kafka listeners (legacy Thrift transformer + v2 proto sink), the catchup signal, and the InNetworkPostsService gRPC handler — semaphore-bounded backpressure, two-stage statistics, spawn_blocking for in-memory lookups.
- 04
X For You algorithm, line by line — Part 4: Home-Mixer core + models
Part 4 of the deep dive into xai-org/x-algorithm. Home-Mixer's binary entry point, QueryBuilder, gRPC service implementations (Scored Posts + For You + URT variant), and the type system — the 50-field ScoredPostsQuery and 30-field PostCandidate plus brand safety verdict computation.
- 05
X For You algorithm, line by line — Part 5: Home-Mixer filters
Part 5 of the deep dive into xai-org/x-algorithm. All 18 filters in home-mixer/filters/ — duplicates, age, self-tweet, retweet dedup, ineligible subscription, three flavors of seen-post filtering (bloom + impressed + served), muted keyword tokenizer matching, 6-relationship socialgraph filter, conversation dedup, and the 571-LOC topic taxonomy filter.
- 06
X For You algorithm, line by line — Part 6: Concrete candidate pipelines
Part 6 of the deep dive into xai-org/x-algorithm. The two pipeline declarations that wire every stage component: ForYouCandidatePipeline (5 sources + blender + 8 side effects) and PhoenixCandidatePipeline (15 query hydrators, 6 sources, 10 hydrators, 14 filters, 3 scorers, 6 post-selection hydrators, 3 post-selection filters, 6 side effects). Plus orphan-file analysis.
- 07
X For You algorithm, line by line — Part 7: Candidate hydrators (part 1 of 2)
Part 7 of the deep dive into xai-org/x-algorithm. The structural candidate hydrators: in_network, core_data, subscription, gizmoduck (composite cache key), blocked_by (non-cached), has_media (shadow-only), language_code, video_duration. The CachedHydrator pattern via Moka cache, three-way result matching, and the has_cached_posts gate.
- 08
X For You algorithm, line by line — Part 8: Candidate hydrators (part 2 of 2)
Part 8 of the deep dive into xai-org/x-algorithm. The semantic candidate hydrators: engagement counts with tweet-age-based TTLs, two-arm A/B-tested brand safety, the packed-bitset tweet_type_metrics, quote-tweet expansion with parallel I/O, two-safety-level visibility filtering, MinHash Jaccard similarity, topic taxonomy lookups, and the facepile.
- 09
X For You algorithm, line by line — Part 9: Query hydrators
Part 9 of the deep dive into xai-org/x-algorithm. The 16 query hydrators that populate ScoredPostsQuery: 4 social-graph fetchers, cached-posts Redis lookup, MinHash signature, two parallel UAS-aggregation calls, served history with fatigue logic, IP geolocation, demographics, tiered gender prediction, Grok-topics bitmaps. Plus the orphan pre-refactor files.
- 10
X For You algorithm, line by line — Part 10: Scorers + Selectors
Part 10 of the deep dive into xai-org/x-algorithm. PhoenixScorer with new-user cluster routing + egress fallback, the 290-line RankingScorer consolidating 22 feature-switch-driven weights + author-diversity exponential decay + tri-branched OON downweighting, VMRanker with DPP diversity, plus the BlenderSelector that interleaves posts/ads/prompts/who-to-follow/push-to-home.
- 11
X For You algorithm, line by line — Part 11: Sources
Part 11 of the deep dive into xai-org/x-algorithm. All 11 source implementations: Thunder for in-network, TweetMixer for legacy OON, three Phoenix retrieval variants (default, topics, MoE), CachedPostsSource bypass, and the For You-specific sources (ScoredPostsSource, AdsSource, WhoToFollowSource, PromptsSource, PushToHomeSource). Cluster resolution, dedup strategy, graceful degradation.
- 12
X For You algorithm, line by line — Part 12: Side effects (part 1)
Part 12 of the deep dive into xai-org/x-algorithm. First half of the side-effect stage: MutualFollow stats, served-history truncation, past-request-timestamps write, Kafka impressions publish, Redis post-candidate cache with zstd, cross-DC Phoenix request cache, ads-injection logging, response-stats counters, 5%-sampled reranking Kafka publish.
- 13
X For You algorithm, line by line — Part 13: Side effects (part 2)
Part 13 of the deep dive into xai-org/x-algorithm. The five heaviest side-effects: shadow-mode multi-cluster Phoenix experiments, served-candidates Kafka publish, the multi-entry served-history Manhattan write, score-distribution + retrieval-position analytics, and the 302-line client-events firehose with cross-product event generation.
- 14
X For You algorithm, line by line — Part 14: Ad blending
Part 14 of the deep dive into xai-org/x-algorithm. The home-mixer/ads/ module: SafeGapAdsBlender (preserves organic order, fills gaps), PartitionOrganicAdsBlender (sandwich pattern with brand-safety partitioning), spacing inference from ad-service positions, three-rule adjacency enforcement (BSR / handle / keyword). Last session before we leave Rust for Python Phoenix.
- 15
X For You algorithm, line by line — Part 15: Phoenix models (the ML core)
Part 15 of the deep dive into xai-org/x-algorithm. The actual neural networks: PhoenixModel (ranking transformer with user+history+candidates in one sequence, candidate isolation, multi-action heads) and PhoenixRetrievalModel (two-tower with transformer user encoder + MLP candidate tower, L2-normalized for ANN search). Hash embeddings, multi-hot action projection, continuous MLPs, post-age bucketing.
- 16
X For You algorithm, line by line — Part 16: Phoenix runners + end-to-end pipeline
Part 16 of the deep dive into xai-org/x-algorithm. The Python runner infrastructure: ModelRunner / RetrievalModelRunner with Haiku transform setup, checkpoint loading from .npz, the unified embedding table layout, three apply functions for retrieval, and run_pipeline.py — the headline release addition that runs retrieval → ranking from exported checkpoints.
- 17
X For You algorithm, line by line — Part 17: Grok transformer + tests
Part 17 — the final Phoenix session. The Grok-1-derived transformer that powers both ranking and retrieval: candidate isolation attention mask, right-anchored RoPE positions, GQA with tanh-clamping, GeGLU feed-forward, the double-layer-norm DecoderLayer, plus the test suites that pin down the most subtle pieces.
- 18
X For You algorithm, line by line — Part 18: Grox core (dispatcher, engine, generators)
Part 18 — Grox is the LLM-driven content-understanding pipeline that produces safety labels, content categories, and multimodal embeddings consumed by the rest of the system. Three Python processes (main / dispatcher / engine) cooperate through a multiprocessing Manager. We walk through main.py, engine.py, dispatcher.py, the schedule context, and all 16 Kafka task generators.
- 19
X For You algorithm, line by line — Part 19: Grox plans + data loaders
Part 19 — Grox's plan layer is a dependency-DAG executor: each plan declares tasks and a dependency map, asyncio futures wire them up, PlanMaster runs all 9 plans in parallel per task. Data loaders cover Kafka (streaming, with prefetch + thread-pool Thrift decode), Strato (on-demand RPC), and a separate-process ASR pipeline with ffmpeg + multimodal LLM.
- 20
X For You algorithm, line by line — Part 20: Grox embedder, summarizer, classifiers
Part 20 — the ML layer of Grox. Two multimodal embedders (V2 with 5 client choices, V5 with single HTTP path), the post summarizer, and six LLM-call classifiers (spam, banger, post-safety, two-stage PTOS, reply ranker). All share the same system-prompt + User+Post + assistant-slot + parse-JSON pattern, with model-tier escalation (mini → primary → primary-critical → EAPI 4.2) for the highest-stakes calls.
- 21
X For You algorithm, line by line — Part 21: Grox tasks part 1 (base + filters + classifier wrappers)
Part 21 — the task layer where plan declarations meet concrete service calls. Base Task classes with retry/skip semantics, env-based disable rules, eligibility filters (spam-vs-reply-ranking follower-bucket split), TTL-cache rate limiters, media+ASR hydration, post reload with not-found retry, and six classifier-wrapper tasks plus the moderation-action trigger.
- 22
X For You algorithm, line by line — Part 22 (finale): Grox publishing layer + series wrap-up
The final part. The publishing layer that writes everything to Manhattan and Kafka: V2/V3/V4/V5 embedder tasks, embedding Kafka publishers, the big task_pub.py kitchen sink, the five embedding sink variants, and the safety annotations sink with bool-metadata derivation and safemodel defense-in-depth. Plus a complete series wrap-up: 22 sessions, 24,914 LOC, Rust + Python, from candidate-pipeline to home-mixer to Phoenix to Grok.