inlineMethology/north-star-framework.html at main · inlineapps/inlineMethology · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>The North Star Framework — How It Actually Drives Priority</title>
<link rel="stylesheet" href="framework.css">
<style>
  /* Page-accent — overrides framework.css fallback */
  :root{--page-accent:var(--purple);--page-accent-soft:var(--purple-soft)}
  /* anatomy diagram */
.anatomy{background:#fff;border:1px solid var(--line);border-radius:14px;padding:26px;box-shadow:var(--shadow);margin:18px 0;text-align:center}
  .nsm{display:inline-block;background:var(--page-accent);color:#fff;border-radius:10px;padding:13px 24px;font-weight:700;font-size:16px}
  .nsm small{display:block;font-weight:400;font-size:12px;color:#e0d2ee;letter-spacing:.03em}
  .down{color:var(--gold);font-size:22px;margin:8px 0;font-weight:700}
  .inputs{display:flex;gap:12px;justify-content:center;flex-wrap:wrap;margin:4px 0}
  .inputs .ip{background:var(--page-accent-soft);border:1px solid #d9c7ea;border-radius:9px;padding:10px 16px;font-weight:600;font-size:14px;color:var(--page-accent)}
  .work{display:flex;gap:10px;justify-content:center;flex-wrap:wrap;margin:4px 0}
  .work .wk{background:#fff;border:1px dashed var(--line);border-radius:8px;padding:8px 13px;font-size:13px;color:var(--ink-soft)}
  .lbl-row{font-size:11.5px;letter-spacing:.12em;text-transform:uppercase;color:var(--ink-soft);font-weight:700;margin:14px 0 4px}
  /* equation */
.eq{background:var(--gold-soft);border:1px solid #e6d8a8;border-radius:12px;padding:18px 22px;margin:18px 0;font-size:15px}
  .eq code{font-family:"SF Mono",ui-monospace,Menlo,Consolas,monospace;font-size:14px;background:#fff;border:1px solid #e6d8a8;border-radius:6px;padding:3px 8px;display:inline-block;margin:4px 0;color:var(--ink)}
  .eq b{color:var(--gold)}
  /* company cards */
.cos{display:grid;grid-template-columns:1fr 1fr;gap:18px;margin:18px 0}
  .co{border-radius:14px;padding:22px;box-shadow:var(--shadow);border:1px solid var(--line);background:#fff;border-top:5px solid var(--page-accent)}
  .co h3{font-family:Georgia,serif;font-size:21px;margin:0 0 4px}
  .co .nsm-tag{font-size:12px;color:var(--page-accent);font-weight:700;text-transform:uppercase;letter-spacing:.05em;margin-bottom:10px}
  .co ul{margin:8px 0;padding-left:20px}
  .co li{font-size:14px;margin:6px 0}
  .co .cav{font-size:12.5px;color:var(--ink-soft);font-style:italic;border-top:1px dashed var(--line);padding-top:8px;margin-top:10px}
  @media(max-width:680px){.cos{grid-template-columns:1fr}}
</style>
</head>
<body>
<nav class="sitenav">
<details>
<summary>📑 Jump to</summary>
<div class="navmenu">
<div class="navgrp"><h4>Start here</h4>
<a href="index.html"><b>← Home (goal &amp; map)</b></a>
<a href="impact-saas-companies.html">SaaS / B2B field study</a>
<a href="impact-consumer-companies.html">Consumer-tech field study</a>
<a href="methodologies-comparison.html"><b>All methods compared →</b></a>
<a href="experiment-trustworthiness.html">How 40k tests actually work →</a>
<a href="jargon.html">Jargon (glossary)</a>
</div>
<div class="navgrp"><h4>Scoring &amp; Input modeling</h4>
<a href="rice-framework.html">RICE (Intercom)</a>
<a class="cur" href="north-star-framework.html">North Star (Amplitude / Slack)</a>
</div>
<div class="navgrp"><h4>Goal-laddering / Define first</h4>
<a href="v2mom-framework.html">V2MOM (Salesforce)</a>
<a href="pyramid-of-clarity-framework.html">Pyramid of Clarity (Asana)</a>
<a href="pr-faq-framework.html">PR-FAQ / Working Backwards (Amazon)</a>
<a href="heart-framework.html">HEART (Google)</a>
<a href="dibb-framework.html">DIBB (Spotify)</a>
</div>
<div class="navgrp"><h4>Experimentation (SaaS)</h4>
<a href="microsoft-exp-framework.html">Microsoft ExP / CUPED</a>
<a href="linkedin-xlnt-framework.html">LinkedIn T-REX</a>
</div>
<div class="navgrp"><h4>Experimentation (Consumer)</h4>
<a href="netflix-experimentation.html">Netflix · ABlaze</a>
<a href="booking-experimentation.html">Booking.com</a>
<a href="airbnb-erf-framework.html">Airbnb ERF</a>
<a href="uber-xp-framework.html">Uber XP</a>
<a href="doordash-switchback-framework.html">DoorDash switchback</a>
<a href="lyft-experimentation.html">Lyft</a>
<a href="pinterest-ab-framework.html">Pinterest</a>
</div>
<div class="navgrp"><h4>AI labs</h4>
<a href="anthropic-pm-on-ai-exponential.html">Anthropic · PM on AI exponential</a>
<a href="google-customer-zero-2026.html">Google · "Customer zero" 2026</a>
</div>
<div class="navgrp"><h4>Written discipline</h4>
<a href="stripe-shaping-framework.html">Stripe shaping</a>
</div>
</div>
</details>
</nav>

<div class="wrap">
  <header class="masthead">
    <p class="kicker">Methods · Deep-dive companion</p>
    <h1>The North Star Framework — and how it <em>actually</em> sets priority <span class="srcyr">2019</span></h1>
    <p class="sub">Slack's "2,000 messages" and Amplitude's "Weekly Learning Users" are <em>metrics</em> — not a way to score a feature. This page closes that gap: what the framework is, and the real mechanism by which it decides what to build first.</p>
    <div class="goal"><span>Goal</span><br>Decide features by data-backed expected impact — choose by outcome, not by to-do list or opinion.</div>
  </header>

  <div class="eli">
    <span class="lbl">🎓 8th-grade version</span>
    <p>Most teams build whatever ideas sound good. North Star says: pick <b>one number</b> that means "our customers are getting value" (Slack picked <em>teams sending lots of messages</em>). Then list the <b>3–5 things teams can directly do</b> to move that number — call those the "inputs." Each cycle, pick the weakest input and <em>only</em> fund ideas that touch it; everything else waits its turn.</p>
    <p>It's a <b>filter</b>, not a score. Once features are filtered in, you still rank them with RICE or a real experiment.</p>
  </div>

  <nav class="toc">
    <a href="#headline">The honest headline</a>
    <a href="#anatomy">Anatomy</a>
    <a href="#mechanism">How it picks work</a>
    <a href="#score">"Scoring" a feature</a>
    <a href="#cos">Slack vs Amplitude</a>
    <a href="#apply">Apply to a sheet</a>
  </nav>

  <div class="finding" id="headline">
    <h2>The honest headline: it is not a scoring formula</h2>
    <p>If you came expecting a number-per-feature like RICE, there isn't one. The North Star Framework <b>does not score features</b>. It does something earlier and more important: it tells you <b>which single metric you are trying to move right now</b>, and it forces every candidate feature to declare <b>which input it claims to move and by how much</b>.</p>
    <p>So "how do Slack/Amplitude decide impact?" — they don't compute an impact score. They <b>model</b> how value is created (<code style="color:#f3d9a0">North Star = f(a few inputs)</code>), pick the input with the most <b>leverage</b>, and only fund work that moves it. The actual ranking of those candidates still falls to a scoring method (RICE/ICE) or a real experiment. The framework is the <b>filter and the focus</b>, not the scoreboard.</p>
  </div>

  <!-- ANATOMY -->
  <h2 class="sec" id="anatomy">Anatomy: three layers, one causal claim</h2>
  <p class="secsub">The framework is a <strong>causal model</strong> — Cutler and others describe it as a belief map or driver diagram: a written claim about how value is produced. Read it top-down as: <em>"if we move these inputs, this North Star rises, and the business grows."</em></p>

  <div class="anatomy">
    <div class="lbl-row">North Star Metric — the one outcome of customer value</div>
    <div class="nsm">Weekly Active Teams<small>leading indicator of revenue · "one level out of reach"</small></div>
    <div class="down">▲ driven by ▲</div>
    <div class="lbl-row">Inputs — 3–5 things teams can directly move</div>
    <div class="inputs">
      <span class="ip">Breadth (how many use it)</span>
      <span class="ip">Depth (messages per team)</span>
      <span class="ip">Retention</span>
      <span class="ip">Efficiency</span>
    </div>
    <div class="down">▲ moved by ▲</div>
    <div class="lbl-row">Work — bets aimed at one input</div>
    <div class="work">
      <span class="wk">Onboarding revamp</span>
      <span class="wk">Slack integrations</span>
      <span class="wk">Search</span>
      <span class="wk">Notifications</span>
    </div>
    <p style="font-size:12.5px;color:var(--ink-soft);margin:14px 0 0">Illustrative composition of the Slack-style model. A widely-cited principle (paraphrased from the Playbook): if a single feature can move the North Star directly, the North Star is too shallow — it should sit far enough out of reach that all work has to go through the inputs.</p>
  </div>
  <p style="font-size:13px;color:var(--ink-soft)">Source: <a class="cite" href="https://amplitude.com/books/north-star/about-north-star-framework">Amplitude — About the North Star Framework</a> (Archana Madhavan &amp; John Cutler, December 4, 2019) · <a class="cite" href="https://blog.doubleloop.app/the-deep-value-of-the-north-star-framework-is-not-a-north-star-metric/">DoubleLoop — the deep value is the model, not the metric</a> (Daniel Schmidt, May 29, 2023)</p>

  <!-- MECHANISM -->
  <h2 class="sec" id="mechanism">How the framework actually picks the work</h2>
  <p class="secsub">This is the part that's usually skipped. DoubleLoop's blunt line: <em>"most companies completely skip the step of making a model,"</em> which is why a North Star metric alone changes nothing. The mechanism is five steps.</p>

  <div class="step"><div class="num">1</div><div><h3>Build the model</h3><p>Write the causal map: <b>North Star = f(input₁, input₂, …)</b>. Each input is something a team can move directly. This is the step everyone skips — without it the metric is just a poster.</p></div></div>
  <div class="step"><div class="num">2</div><div><h3>Find the leverage point</h3><p>Ask which input is <b>most movable and most underperforming right now</b> — the one where a small push produces the biggest move on the North Star. That input becomes this cycle's focus, not all of them. (The Playbook frames good strategy as picking <em>where</em> to focus, not what to do — the leverage point is the <em>where</em>.)</p></div></div>
  <div class="step"><div class="num">3</div><div><h3>Set a time-boxed target on it (OKR)</h3><p>Turn the chosen input into a goal: move <em>this input</em> to <em>this value</em> by <em>this date</em>. The roadmap is now "bets aimed at that one input," not a wishlist.</p></div></div>
  <div class="step"><div class="num">4</div><div><h3>Map each candidate feature to the input it claims to move</h3><p>For every idea, the question becomes: <b>which input does this move, and by how much?</b> An idea that doesn't touch the focus input is out of scope this cycle — that is the filter doing its job.</p></div></div>
  <div class="step"><div class="num">5</div><div><h3>Ship, then track input <em>and</em> output</h3><p>Measure whether the input actually moved (did the feature work?) <b>and</b> whether the North Star followed (was the input the right lever?). Both can fail — and that feedback corrects the model for next cycle.</p></div></div>

  <!-- SCORE -->
  <h2 class="sec" id="score">So how do you "score" a feature? You estimate its lift on the input</h2>
  <p class="secsub">The framework reframes "impact" from a vague guess about revenue into a concrete, checkable claim about <strong>one input</strong>. That's far easier to estimate — and far easier to be proven wrong about.</p>

  <div class="eq">
    Expected impact of a feature, the North Star way:<br>
    <code>impact = (projected lift to the focus input) × (that input's modeled weight on the North Star)</code><br>
    <span style="font-size:13.5px;color:var(--ink-soft)">Example: "Onboarding revamp → we believe it lifts <b>% of teams reaching 2,000 messages</b> from 40%→55%; that input is the strongest driver of Weekly Active Teams, so it's high-impact <em>this cycle</em>." Notice the claim is testable, scoped to one input, and time-bound.</span>
  </div>

  <div class="note"><b>The framework does not replace RICE / ICE — it feeds them.</b> The North Star model narrows 50 ideas down to the handful that move the focus input, and gives each a sharper Reach/Impact basis (lift-on-input instead of gut feel). You then rank that short list with a scoring method, or run a <em><a class="j" href="jargon.html#fake-door-test">fake-door test</a></em> (a fake button or landing page for a feature that doesn't yet exist, used to measure real interest) or controlled experiment to <em>measure</em> the lift instead of estimating it. See the <a class="cite" href="methodologies-comparison.html#stacks">three coherent stacks</a> on the comparison page for how teams pair this with RICE or experimentation.</div>

  <!-- COMPANIES -->
  <h2 class="sec" id="cos">Slack vs Amplitude — same framework, different maturity of evidence</h2>
  <p class="secsub">Both are real input-modeling cases, but be honest about the sourcing: Amplitude <em>authored</em> the method and documents it; Slack is the famous <em>activation-input</em> story told mostly by its founder, not a published per-feature process.</p>

  <div class="cos">
    <div class="co">
      <h3>Slack</h3>
      <div class="nsm-tag">North Star · Weekly Active Teams</div>
      <ul>
        <li><b>The input that defined focus:</b> a team reaching <b>2,000 messages sent</b>. Butterfield: <em>"after 2,000 messages, 93% of those customers are still using Slack today."</em></li>
        <li><b>How it set priority:</b> the metric pointed product/onboarding work at <em>getting more teams across that activation threshold faster</em> — integrations, onboarding, invites — rather than scattered feature requests.</li>
        <li>It's a <b>focus mechanism</b>: the input told them where to aim, not a number per feature.</li>
      </ul>
      <div class="cav">Caveat: the 2,000-message figure and NSM are founder / secondary accounts (First Round, Amplitude's example), not a published internal scoring sheet.</div>
      <div style="margin-top:10px;font-size:13px"><a class="cite" href="https://review.firstround.com/from-0-to-1b-slacks-founder-shares-their-epic-launch-strategy/">First Round — Butterfield interview</a></div>
    </div>
    <div class="co">
      <h3>Amplitude</h3>
      <div class="nsm-tag">North Star · Weekly Learning Users</div>
      <ul>
        <li><b>Authored the Playbook</b> — the framework's most complete public write-up (NSM + inputs + guardrails + the model).</li>
        <li><b>How it sets priority:</b> a "North Star tree" connects each initiative to the input it moves; teams pick bets at the leverage points and set <a class="j" href="jargon.html#okr">OKRs</a> on inputs.</li>
        <li>Adds <b>guardrail metrics</b> (churn, support load) so a feature that lifts an input but harms the product is caught.</li>
      </ul>
      <div class="cav">Caveat: precise WLU input names came via search snippet — confirm on the Playbook page itself.</div>
      <div style="margin-top:10px;font-size:13px"><a class="cite" href="https://amplitude.com/resources/north-star-playbook">Amplitude — North Star Playbook</a></div>
    </div>
  </div>
  <!-- APPLY TO A SHEET -->
  <h2 class="sec" id="apply">Apply to a feature sheet</h2>
  <p class="secsub">North Star doesn't score features — it filters them. If you adopt it, your backlog gains columns that force every idea to declare <strong>which input it claims to move</strong>. Features that don't touch this cycle's <em>focus input</em> aren't ranked low — they're out of scope, full stop.</p>

  <div class="note" style="background:var(--teal-soft);border-left-color:var(--teal)"><b style="color:var(--teal)">Try it Monday morning (30 minutes).</b> Write your team's North Star at the top of a sheet. List the 3–5 inputs that move it. Pick the one input you'd most want to lift this cycle and circle it. Now go through your last 10 backlog items and write the input each claims to move next to its title. Items that name nothing — or name a different input — are this cycle's "out of scope." That's the framework working. The hardest part is naming the inputs honestly; everything else follows.</div>

  <div class="extable">
    <table class="ex">
      <thead><tr><th>Column to add</th><th>What it captures</th><th>How you fill it</th></tr></thead>
      <tbody>
        <tr><td>Feature</td><td>Idea or backlog item</td><td>Existing backlog title</td></tr>
        <tr><td>Input metric moved</td><td>Which of the 3–5 named inputs this feature claims to lift (must be one of them)</td><td>Pick from your model's input list — or write "none" to surface the problem</td></tr>
        <tr><td>Projected lift</td><td>Concrete delta on that input, with units</td><td>e.g. "% teams reaching 2,000 messages: 40% → 55%"</td></tr>
        <tr><td>Confidence in lift</td><td>Why you believe the projection</td><td>Data (have funnel) / Analogy (similar feature shipped) / Hunch</td></tr>
        <tr><td>Touches focus input?</td><td>Is the lifted input <em>this cycle's</em> focus, or a different input?</td><td>Yes / No — derived from your current OKR</td></tr>
        <tr><td>Effort</td><td>Rough size for ranking among in-scope features</td><td>T-shirt or person-months</td></tr>
        <tr><td>In-scope this cycle?</td><td>The verdict</td><td>In (focus input + plausible lift) / Out (wrong input or no input claim)</td></tr>
      </tbody>
    </table>
  </div>

  <h3 style="font-family:Georgia,serif;font-size:18px;margin:24px 0 8px">Worked example — a backlog snapshot against a Slack-style NSM</h3>
  <p style="font-size:13.5px;color:var(--ink-soft);margin:0 0 12px">NSM = <b>Weekly Active Teams</b>. Inputs = <b>Breadth</b> (teams active), <b>Depth</b> (% teams reaching 2,000 messages), <b>Retention</b> (4-wk return), <b>Efficiency</b> (time-to-first-message). <b>This cycle's focus input = Depth.</b> Numbers illustrative.</p>

  <div class="extable">
    <table class="ex">
      <thead><tr><th>Feature</th><th>Input moved</th><th>Projected lift</th><th>Confidence</th><th>Focus input?</th><th>Effort (PM)</th><th>Verdict</th></tr></thead>
      <tbody>
        <tr class="top"><td>Onboarding revamp</td><td>Depth</td><td>40% → 55% reach 2k msgs</td><td>Data — funnel measured</td><td>Yes</td><td>4</td><td class="score">In</td></tr>
        <tr class="top"><td>Channel templates library</td><td>Depth</td><td>+5 pp Depth</td><td>Analogy — similar OSS feature shipped</td><td>Yes</td><td>2</td><td class="score">In</td></tr>
        <tr class="top"><td>Invite-after-first-channel nudge</td><td>Depth</td><td>+3 pp Depth</td><td>Data — pilot A/B at +3.2%</td><td>Yes</td><td>1</td><td class="score">In</td></tr>
        <tr><td>Mobile push notifications</td><td>Depth (msgs read)</td><td>+6% msgs read on mobile</td><td>Hunch</td><td>Yes (Depth-adjacent)</td><td>5</td><td class="score" style="color:var(--gold)">In · low conf — pilot first</td></tr>
        <tr><td>Re-engagement email for dormant teams</td><td>Retention</td><td>+2 pp 4-wk return</td><td>Data</td><td>No (Retention)</td><td>2</td><td class="score" style="color:var(--ink-soft)">Out · queue (next cycle if focus shifts)</td></tr>
        <tr><td>Expand integrations directory</td><td>Breadth</td><td>+8% teams with ≥1 integration</td><td>Analogy</td><td>No (Breadth)</td><td>6</td><td class="score" style="color:var(--ink-soft)">Out · queue</td></tr>
        <tr><td>Faster search index</td><td>Efficiency</td><td>-200 ms p50</td><td>Data</td><td>No (Efficiency)</td><td>3</td><td class="score" style="color:var(--ink-soft)">Out · queue</td></tr>
        <tr><td>Dark mode</td><td>None claimed</td><td>—</td><td>—</td><td>No</td><td>1</td><td class="score" style="color:var(--accent)">Out · vanity</td></tr>
      </tbody>
    </table>
  </div>

  <div class="note" style="background:var(--accent-soft);border-left-color:var(--accent)"><b style="color:var(--accent)">The most important reading skill for this sheet.</b> <strong>Out · queue ≠ bad idea.</strong> Re-engagement, integrations, and search above are all reasonable bets — they just don't touch <em>this cycle's</em> focus input (Depth). They sit in queue, fully credited, until a future cycle's focus shifts to Retention/Breadth/Efficiency. The <em>only</em> rows that get killed outright are the ones that name no input at all (here: Dark mode). That distinction — "wrong cycle" vs "wrong idea" — is what protects the framework from feeling like a kill-list.</div>

  <div class="note"><b>Decision rule.</b> Three gates, in order: <b>(1)</b> does the feature name an input it moves? If "none," it's a vanity request — out. <b>(2)</b> Is that input <em>this cycle's focus</em>? If no, queue for a future cycle when that input becomes the focus — not a cut. <b>(3)</b> Is the projected lift plausible — backed by data, analogy, or at minimum a fake-door test plan? Only features passing all three enter the ranked build queue, where you score them with RICE/ICE or run experiments. Illustrative numbers; the framework is the filter, not the scoreboard.</div>

  <footer>
    Companion to <a href="impact-saas-companies.html#input">← SaaS case studies · Input Modeling</a><br>
    Grounded in: <a href="https://amplitude.com/books/north-star/about-north-star-framework">Amplitude North Star Playbook</a> (Archana Madhavan &amp; John Cutler, December 4, 2019); <a href="https://blog.doubleloop.app/the-deep-value-of-the-north-star-framework-is-not-a-north-star-metric/">DoubleLoop — "the deep value is the model, not the metric"</a> (Daniel Schmidt, May 29, 2023); <a href="https://review.firstround.com/from-0-to-1b-slacks-founder-shares-their-epic-launch-strategy/">First Round interview with Stewart Butterfield</a>. Butterfield's "after 2,000 messages, 93% of those customers are still using Slack today" is verbatim from the interview; the DoubleLoop "most companies skip the step of making a model" is verbatim from the essay. Phrases like "belief map / driver diagram" and "one level out of reach" are paraphrased from the Playbook's framing — the canonical source is linked above but the exact chapter wording was not re-verified during this review.
  </footer>
</div>
</body>
</html>