Moving from "Revenue" to "Controllable Inputs."
2.2.1 The Metric Stack (North Star, drivers, guardrails)

If you let teams choose metrics by convenience, they will optimize dashboards, not the business.
Every serious experiment should have a clear hierarchy:
- North Star metric
- The thing leadership actually cares about: bookings, retained users, long‑term engagement.
- If your experiment does not have a believable path to this metric, question why you are running it.
- Driver / surrogate / operational metrics
- 1–3 metrics that move faster or with less noise, and are tightly coupled to the North Star.
- These are often your primary experimental readouts.
- Guardrail metrics
- Things you refuse to trade off recklessly: latency, crash rate, abuse, supply health, complaints, etc.
- Explicitly define "how much pain is too much" up front.
Guardrails by domain (non‑exhaustive, but useful):
- E‑commerce: page load time, checkout errors, cart abandonment, CS contacts.
- Social: content quality, report/block rates, platform stability.
- Ride‑sharing: driver availability, wait times, cancellations, safety.
- Streaming: buffering, crash rate, content availability, churn.
Staff‑level expectations:
- You do not accept tests with a random grab‑bag of 20 charts and no written success criteria.
- You push teams to define: "If we see X on the primary metric and Y on guardrails, we ship. Otherwise we do not."
Case study: News feed redesign with hidden costs
A social app wants to launch a new feed layout expected to increase time spent.
- Primary metric: daily time spent in feed.
- North Star: 7‑day retained active users.
- Guardrails: hide rate, report rate, and session crash rate.
Experiment results:
- Time spent is up 6%.
- Retention is flat over the short run.
- But report rate for low‑quality content is down and hide rate is up, suggesting users feel worse but are not reporting.
- Crash rate has also increased on low‑end devices.
Outcome: you argue against a blind rollout. Instead, the team iterates on stability and content quality surfaces, then re‑tests. The eventual launch version gets a smaller time‑spent lift but does not degrade guardrails. You used guardrails to protect long‑term trust.
How to talk through this in an interview
- Mid‑level answer: Explain that you used guardrails to catch negative side‑effects and decided not to ship despite a headline win on time spent.
- Senior answer: Connect this to long‑term user trust: you articulate why time spent is an incomplete proxy and how the guardrails map to user experience and abuse risk.
- Staff answer: Show how this incident led you to codify a metric framework for attention‑grabbing features (North Star + driver + guardrails), influence future roadmap discussions, and raise the bar for experiments that could trade off engagement vs. trust.