-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathpinterest-ab-framework.html
More file actions
167 lines (151 loc) · 20.5 KB
/
Copy pathpinterest-ab-framework.html
File metadata and controls
167 lines (151 loc) · 20.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Pinterest — real-time A/B monitoring as the operational discipline</title>
<link rel="stylesheet" href="framework.css">
<style>
/* Page-accent — overrides framework.css fallback */
:root{--page-accent:var(--blue);--page-accent-soft:var(--blue-soft)}
.three{display:grid;grid-template-columns:repeat(3,1fr);gap:14px;margin:14px 0}
.concept{background:#fff;border:1px solid var(--line);border-radius:12px;padding:18px 20px;box-shadow:var(--shadow);border-top:5px solid var(--page-accent)}
.concept .ab{font-family:Georgia,serif;font-size:13px;color:var(--page-accent);font-weight:700;letter-spacing:.05em;text-transform:uppercase;margin-bottom:4px}
.concept h3{margin:0 0 8px;font-size:17px;font-family:Georgia,serif}
.concept p{margin:0;font-size:13.5px;color:var(--ink-soft);line-height:1.5}
.concept p b{color:var(--ink)}
.caveat{font-size:12.5px;color:#8a6d2e;background:#fbf6e7;border:1px dashed #d8c98f;border-radius:6px;padding:7px 11px;margin-top:10px}
@media(max-width:680px){.three{grid-template-columns:1fr}}
</style>
</head>
<body>
<nav class="sitenav">
<details>
<summary>📑 Jump to</summary>
<div class="navmenu">
<div class="navgrp"><h4>Start here</h4>
<a href="index.html"><b>← Home (goal & map)</b></a>
<a href="impact-saas-companies.html">SaaS / B2B field study</a>
<a href="impact-consumer-companies.html">Consumer-tech field study</a>
<a href="methodologies-comparison.html"><b>All methods compared →</b></a>
<a href="experiment-trustworthiness.html">How 40k tests actually work →</a>
<a href="jargon.html">Jargon (glossary)</a>
</div>
<div class="navgrp"><h4>Scoring & Input modeling</h4>
<a href="rice-framework.html">RICE (Intercom)</a>
<a href="north-star-framework.html">North Star (Amplitude / Slack)</a>
</div>
<div class="navgrp"><h4>Goal-laddering / Define first</h4>
<a href="v2mom-framework.html">V2MOM (Salesforce)</a>
<a href="pyramid-of-clarity-framework.html">Pyramid of Clarity (Asana)</a>
<a href="pr-faq-framework.html">PR-FAQ / Working Backwards (Amazon)</a>
<a href="heart-framework.html">HEART (Google)</a>
<a href="dibb-framework.html">DIBB (Spotify)</a>
</div>
<div class="navgrp"><h4>Experimentation (SaaS)</h4>
<a href="microsoft-exp-framework.html">Microsoft ExP / CUPED</a>
<a href="linkedin-xlnt-framework.html">LinkedIn T-REX</a>
</div>
<div class="navgrp"><h4>Experimentation (Consumer)</h4>
<a href="netflix-experimentation.html">Netflix · ABlaze</a>
<a href="booking-experimentation.html">Booking.com</a>
<a href="airbnb-erf-framework.html">Airbnb ERF</a>
<a href="uber-xp-framework.html">Uber XP</a>
<a href="doordash-switchback-framework.html">DoorDash switchback</a>
<a href="lyft-experimentation.html">Lyft</a>
<a class="cur" href="pinterest-ab-framework.html">Pinterest</a>
</div>
<div class="navgrp"><h4>AI labs</h4>
<a href="anthropic-pm-on-ai-exponential.html">Anthropic · PM on AI exponential</a>
<a href="google-customer-zero-2026.html">Google · "Customer zero" 2026</a>
</div>
<div class="navgrp"><h4>Written discipline</h4>
<a href="stripe-shaping-framework.html">Stripe shaping</a>
</div>
</div>
</details>
</nav>
<div class="wrap">
<header class="masthead">
<p class="kicker">Methods · Deep-dive · Experimentation</p>
<h1>Pinterest — real-time A/B monitoring as the operational discipline <span class="srcyr">2017</span></h1>
<p class="sub">Pinterest's published platform story is the most ordinary on the list — and that is the lesson. <strong>You don't need a famous framework name</strong>; you need a working A/B platform with a lightweight config UI, a clean QA workflow, simplified cross-platform APIs, and a real-time monitoring dashboard so regressions get caught in minutes, not days.</p>
<p class="sub">Canonical source: <a class="cite" href="https://medium.com/pinterest-engineering/building-pinterests-a-b-testing-platform-ab4934ace9f4">"Building Pinterest's A/B testing platform" (Pinterest Engineering, February 21, 2017)</a> by Shuo Xiang, Bryant Xiao, Justin Mejorada Pier, Jooseong Kim, Chunyan Wang and the Data Engineering team. Real-time monitoring is covered in a separate follow-up post.</p>
<div class="goal"><span>Goal</span><br>Decide features by data-backed expected impact — choose by outcome, not by to-do list or opinion.</div>
</header>
<div class="eli">
<div class="lbl">🎓 8th-grade version</div>
Pinterest's A/B testing isn't fancy — that's the point. They built three boring-but-good things: (1) a simple <b>web UI</b> where anyone can set up a test without writing code, (2) one set of <b>APIs</b> that works the same on iPhone, Android, and the website, and (3) a <b>live dashboard</b> that shows how the test is going <em>right now</em>, so if something goes wrong you see it in minutes instead of two weeks later. Most companies should copy this version first — fancy techniques like <a class="j" href="jargon.html#cuped">CUPED</a> or switchback come later, only if your problem actually needs them. The transferable lesson: <em>boring + reliable + fast feedback</em> beats clever-but-rare.
</div>
<nav class="toc">
<a href="#headline">Honest headline</a>
<a href="#anatomy">What the platform does</a>
<a href="#mechanism">How decisions get made</a>
<a href="#apply">Apply to a sheet</a>
<a href="methodologies-comparison.html" style="color:var(--blue);font-weight:700">Comparison table →</a>
</nav>
<div class="finding" id="headline">
<h2>The honest headline: the "boring" experimentation platform is the most copyable one</h2>
<p>Microsoft has CUPED. DoorDash has switchback. Pinterest has… a competent A/B testing platform with a real-time dashboard. <b>That's not a weakness</b> — it's exactly the playbook a smaller team can copy. Lightweight config, standard analyses, fast monitoring. No bespoke statistics required.</p>
<p>The published distinguishing feature: <b>real-time regression monitoring</b> — when an experiment goes off the rails, you see it within minutes, not at the end of a 2-week run. That changes how brave you can be about ramping things.</p>
</div>
<!-- ANATOMY -->
<h2 class="sec" id="anatomy">What the Pinterest platform actually does</h2>
<div class="three">
<div class="concept">
<div class="ab">Config UI</div>
<h3>Lightweight experiment setup</h3>
<p>A browser UI lets engineers and PMs configure experiments without code or <a class="j" href="jargon.html#dsl">DSL</a>. The cost-to-launch is what determines volume — Pinterest optimised this hard.</p>
</div>
<div class="concept">
<div class="ab">Cross-platform</div>
<h3>Simplified APIs across iOS / Android / web</h3>
<p>One experiment can span all surfaces with consistent metric definitions. Removes the "Android version of this test is reading differently" failure mode.</p>
</div>
<div class="concept">
<div class="ab">Real-time</div>
<h3>Real-time monitoring dashboard</h3>
<p>Live metrics during the experiment so regressions are caught in <b>minutes, not days</b>. The single biggest operational difference from a "wait two weeks then check" workflow.</p>
</div>
</div>
<div class="src">Sources: <a class="cite" href="https://medium.com/pinterest-engineering/building-pinterests-a-b-testing-platform-ab4934ace9f4">Xiang, Xiao, Mejorada Pier, Kim, Wang & Data Engineering — "Building Pinterest's A/B testing platform" (Pinterest Engineering, February 21, 2017)</a> · <a class="cite" href="https://medium.com/pinterest-engineering/scalable-a-b-experiments-at-pinterest-1e28ddb7d22">"Scalable A/B experiments at Pinterest"</a> · <a class="cite" href="https://medium.com/pinterest-engineering/monitoring-a-b-experiments-in-real-time-5cd3ee611c1">"Monitoring A/B experiments in real-time"</a>.</div>
<div class="caveat">Sourcing caveat: Pinterest's published material focuses on the <em>platform</em>, not on the decision-making process around it. We have less visibility into their prioritisation discipline than for (say) Microsoft or Booking. The platform is well-documented; the meeting-room rituals are not.</div>
<!-- MECHANISM -->
<h2 class="sec" id="mechanism">How a Pinterest-style decision gets made</h2>
<p class="secsub">The mechanism inferred from the platform posts: <em>experimentation is so cheap that running one is the default, and real-time monitoring keeps the cost of failure low</em>.</p>
<div class="step"><div class="num">1</div><div><h3>Configure the experiment in the UI</h3><p>Hypothesis, OEC, guardrails, allocation %. Done in a browser; no code change to launch.</p></div></div>
<div class="step"><div class="num">2</div><div><h3>Ship behind the flag, watch the real-time dashboard</h3><p>Live metrics. If anything spikes in the wrong direction, see it in minutes.</p></div></div>
<div class="step"><div class="num">3</div><div><h3>Ramp through allocations</h3><p>Standard staged-rollout pattern. Each step waits on the dashboard to clear.</p></div></div>
<div class="step"><div class="num">4</div><div><h3>Read the OEC at significance; ship, kill, or iterate</h3><p>Standard OEC discipline; no exotic statistics required.</p></div></div>
<!-- APPLY TO A SHEET -->
<h2 class="sec" id="apply">Apply to a feature sheet</h2>
<p class="secsub">Pinterest's shape is the "boring" experiment ledger — standard A/B columns + a <em>real-time monitoring</em> column. The defining feature isn't an exotic design; it's that regressions get caught in <em>minutes</em> via live dashboards, so risky variants can be pulled before users notice. This is the realistic template for most teams.</p>
<div class="note" style="background:var(--teal-soft);border-left-color:var(--teal)"><b>Try it Monday morning (30 minutes).</b> Pick the most recent experiment your team ran. Ask: <em>how long after a guardrail breached would we have noticed?</em> Hours? A day? End of sprint? That latency is the cost you pay for not having Pinterest-style real-time monitoring. You don't need a dashboard platform to fix it — even a basic live query of latency / errors / unsubscribe rate running during a ramp closes most of the gap. The lesson isn't "build a dashboard"; it's "the time-to-detection of a bad variant is a number your team should know and shorten."</div>
<div class="note" style="background:var(--blue-soft);border-left-color:var(--blue);font-size:13.5px"><b>Quick glossary for the columns below.</b> <b>OEC</b> = <a class="j" href="jargon.html#oec">Overall Evaluation Criterion</a> (the agreed success metric). <b>Live dashboard</b> = the monitoring view that flags guardrail breaches during the experiment, not after. <b>Saves / session</b> = a Pinterest-specific OEC (saves are the platform's primary engagement action). <b>Hide rate</b> = how often users hide a piece of content — a quality guardrail for ranking changes. <b>SRM</b> = <a class="j" href="jargon.html#srm">Sample-Ratio Mismatch</a> (auto-flagged broken randomization).</div>
<h3 style="font-family:Georgia,serif;font-size:18px;margin:18px 0 8px">Worked example — an experiment ledger snapshot (Pinterest-style)</h3>
<p style="font-size:13.5px;color:var(--ink-soft);margin:0 0 12px">Eight tests. OECs reflect Pinterest's published priorities — save rate, session length, retention. Numbers illustrative.</p>
<div style="overflow-x:auto;margin:14px 0">
<table style="border-collapse:collapse;width:100%;font-size:13px;background:#fff;border:1px solid var(--line);border-radius:10px;overflow:hidden">
<thead><tr style="background:var(--ink);color:#f3efe6;font-size:11.5px;letter-spacing:.05em;text-transform:uppercase"><th style="padding:9px 10px;text-align:left">Feature</th><th style="padding:9px 10px;text-align:left">OEC</th><th style="padding:9px 10px;text-align:left">Guardrails</th><th style="padding:9px 10px;text-align:left">N/arm</th><th style="padding:9px 10px;text-align:left">Live dashboard</th><th style="padding:9px 10px;text-align:left">Result</th><th style="padding:9px 10px;text-align:left">Decision</th></tr></thead>
<tbody>
<tr style="background:#e6ecf6"><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-weight:600">New feed ranking algo</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Saves / session</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Latency, hide rate</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">2M</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Clean</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">+2.1% sig</td><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-family:Georgia,serif;color:var(--blue);font-weight:700">Ship</td></tr>
<tr style="background:#e6ecf6"><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-weight:600">AI alt-text generation</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Accessibility tickets ↓</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Engagement</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">500k</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Clean</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">−40% tickets, engagement flat</td><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-family:Georgia,serif;color:var(--blue);font-weight:700">Ship</td></tr>
<tr style="background:#e6ecf6"><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-weight:600">Onboarding interest picker</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)"><a class="j" href="jargon.html#dn-activation">D7</a> retention</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Onboarding-abandon</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">300k</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Clean</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">D7 +1.4pp, abandon OK</td><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-family:Georgia,serif;color:var(--blue);font-weight:700">Ship</td></tr>
<tr style="background:#e6ecf6"><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-weight:600">Lens visual search</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Saves / session</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Search latency</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">800k</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Clean</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">+0.3% sig</td><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-family:Georgia,serif;color:var(--blue);font-weight:700">Ship</td></tr>
<tr><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-weight:600">New shopping-pin UI</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Shop clicks / session</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Spam reports</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">400k</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Clean</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Flat</td><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-family:Georgia,serif;color:var(--gold);font-weight:700">Iterate</td></tr>
<tr><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-weight:600">Notification re-engagement</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Sessions / wk</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Unsubscribes</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">1M</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Clean</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">+1% sessions BUT unsubs +8%</td><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-family:Georgia,serif;color:var(--gold);font-weight:700">Iterate — softer frequency</td></tr>
<tr><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-weight:600">Bigger pin cards</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Saves / session</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Page latency</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">300k</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">⚠ latency +120ms</td><td style="padding:9px 10px;border-bottom:1px solid var(--line)">Saves +0.6% BUT latency breach</td><td style="padding:9px 10px;border-bottom:1px solid var(--line);font-family:Georgia,serif;color:var(--accent);font-weight:700">Kill</td></tr>
<tr><td style="padding:9px 10px;font-weight:600">Auto-play video pins</td><td style="padding:9px 10px">Saves / session</td><td style="padding:9px 10px">Battery complaints</td><td style="padding:9px 10px">200k</td><td style="padding:9px 10px">⚠ battery −20%</td><td style="padding:9px 10px">Saves +0.4% BUT battery complaints +20%</td><td style="padding:9px 10px;font-family:Georgia,serif;color:var(--accent);font-weight:700">Kill — caught in real-time</td></tr>
</tbody>
</table>
</div>
<div class="note" style="background:var(--accent-soft);border-left-color:var(--accent)"><b>The most important reading skill on this page.</b> Look at the last two rows — "Bigger pin cards" and "Auto-play video pins." Both have OECs that <em>moved positive</em> (+0.6% and +0.4% saves). Both got killed. Why? The <b>live dashboard column</b> shows ⚠ — guardrail alarms that fired during the ramp, not at the end. Without real-time monitoring, both features would have likely shipped, and the latency/battery problems would have shown up in user complaints over the following weeks. The Pinterest leverage isn't the OEC math — it's <em>shrinking the time between "this is wrong" and "we know it's wrong"</em> from a week to a few minutes.</div>
<div class="note"><b>Decision rule.</b> Standard OEC + guardrails. What makes Pinterest's shape work is the <em>live-dashboard column</em>: bad variants don't run for a week before someone notices — the dashboard spikes within minutes and the variant is pulled. The auto-play video and bigger-cards rows both got killed by real-time alarms that flagged guardrail breaches, not by post-hoc analysis. For most teams, this is the realistic starting template — competent platform + live monitoring + standard A/B beats waiting to build CUPED.</div>
<div class="note"><b>Why the "boring" platform is the realistic model for most teams.</b> Microsoft's CUPED, DoorDash's switchback, Netflix's causal ML are all real, but most are overkill for a company under 1,000 employees. The Pinterest shape — competent platform + real-time monitoring + standard A/B — is what a team realistically <em>builds first</em>. Get this right; layer the exotic stuff later if the problem demands it.</div>
<footer>
Companion to <a href="impact-consumer-companies.html#measure">← Consumer case studies · Measure don't estimate</a> · <a href="methodologies-comparison.html">All methods compared</a> · siblings: <a href="microsoft-exp-framework.html">Microsoft ExP</a> (the same shape at extreme scale)<br>
<b>Grounded in</b> <a href="https://medium.com/pinterest-engineering/building-pinterests-a-b-testing-platform-ab4934ace9f4">"Building Pinterest's A/B testing platform" (Pinterest Engineering, February 21, 2017, by Shuo Xiang, Bryant Xiao, Justin Mejorada Pier, Jooseong Kim, Chunyan Wang and the Data Engineering team)</a>, the <a href="https://medium.com/pinterest-engineering/scalable-a-b-experiments-at-pinterest-1e28ddb7d22"><em>Scalable A/B experiments at Pinterest</em></a> follow-up, and the <a href="https://medium.com/pinterest-engineering/monitoring-a-b-experiments-in-real-time-5cd3ee611c1"><em>Monitoring A/B experiments in real-time</em></a> post. <b>Verbatim from Pinterest:</b> the platform components (lightweight config UI, QA workflow, simplified cross-platform APIs) and the real-time monitoring framing. <b>Sourcing caveat:</b> Pinterest's published material focuses on the <em>platform</em>, not on the decision-making meeting rituals around it — we have less visibility into their prioritisation discipline than for (say) Microsoft or Booking. <b>Added by us, not in Pinterest's posts:</b> the 8-row worked-example ledger, the Ship/Iterate/Kill verdict labels, the in-page glossary, and the "Try it Monday" exercise.
</footer>
</div>
</body>
</html>