Most experiments can only observe short‑term behavior, but the business usually cares about long‑term outcomes: retention, LTV, marketplace health, brand trust. Staff‑level experimentation is about engineering the bridge between the two: choosing short‑term metrics that are causally tied to long‑term value, not just correlated.
3.2.1 Designing causal proxy metrics
Work backwards from the thing leadership actually cares about:
- Consumer: 6‑month retained buyers, subscription renewal, churn.
- Marketplace: 90‑day driver/merchant retention, supply stability, profitability.
- SaaS: expansion revenue, net dollar retention, seat growth.
For each, write the behavioral story:
“For a user to be a 6‑month retained buyer, what do they do in the first 1–4 weeks?”
You are looking for necessary early behaviors, not nice‑to‑have vanity metrics.
Examples:
- 6‑month retention → returning in week 2, 3, 4; number of sessions; number of successful “core actions”.
- LTV → early basket size, diversity of use cases (e.g., ordering multiple categories, using multiple surfaces).
- Marketplace health → early earnings stability for supply, cancellation experiences for demand.
You want a mechanism, not just “this chart moves with that chart.”
Template:
Treatment → early behavioral change → structural change in relationship → long‑term outcome.
Concretely:
- “New onboarding checklist → more users complete 3 key setup steps → more users reach ‘aha’ moment → higher 3‑month retention.”
- “Better restaurant recommendations on home → more first orders from high‑quality merchants → better first‑week NPS → higher 90‑day order frequency.”
If the mechanism is fuzzy, your “proxy metric” is probably just a lagging vanity metric.
Good short‑term proxies usually have three properties:
- Proximal: They occur soon after treatment (days or weeks, not quarters).
- Mechanistic: They are directly on the causal path you described.
- Stable: Their relationship to the long‑term metric is reasonably stable across time and cohorts.
Concrete examples:
- Onboarding / activation
- Long‑term: 90‑day retained active users.
- Proxy candidates:
- “Completed 3 key actions within first 7 days.”
- “Returned in week 2 and week 3.”
- Why they work: The mechanism is “users who understand value and build habit early are much more likely to stay.”
- Subscriptions / paywalls
- Long‑term: 6‑month subscriber retention, ARPU.
- Proxy candidates:
- “Number of paywalled articles read in trial.”
- “Days with any consumption in first 14 days.”
- Why they work: These capture engaged subscribers vs. those who churn as soon as friction appears.
- Marketplaces
- Long‑term: sustainable supply, cross‑side retention.
- Proxy candidates:
- For drivers: “earned at least $X per online hour in first 2 weeks”, “fraction of sessions with unacceptable idle time.”
- For customers: “experienced at most Y bad ETAs or cancellations in first N orders.”
- Why they work: Income and experience early in the lifecycle shape whether participants stick around.
Bad proxies to push back on:
- Clicks on a shiny surface with no clear path to habit or revenue.
- Session length when the product is supposed to make users faster.
- Email opens when the real value is downstream purchases.
3.2.2 Validating proxies empirically
You earn the right to use a proxy by establishing its predictive, causal relationship to the long‑term metric.
Tactics:
- Retrospective cohort analysis
- Take past users, bin them by proxy value in early period (e.g., “0 core actions”, “1–2”, “3+”).
- Plot long‑term outcomes by bin (retention, LTV, churn).
- You want strong, monotonic differences that are robust to simple controls (cohort, region, device).
- Simple causal checks
- Use observational tools (e.g., regression, matching) to see whether the proxy still predicts long‑term outcomes after controlling for obvious confounders like tenure, acquisition channel, geography.
- You are not trying to publish a paper, just to check the relationship is not entirely spurious.
- “Proxy experiments”
- Sometimes you can run experiments whose primary goal is to move the proxy, then check long‑term follow‑up for units in those experiments.
- If interventions that reliably move the proxy also reliably move the long‑term metric in the expected direction (when measured months later), your confidence in that proxy increases.
Staff‑level behavior: you push for at least one of these to be done once per major product surface, then stabilize on a small set of “blessed” proxies.