logo

Short‑Term vs Long‑Term Effects

Most experiments can only observe short‑term behavior, but the business usually cares about long‑term outcomes: retention, LTV, marketplace health, brand trust. Staff‑level experimentation is about engineering the bridge between the two: choosing short‑term metrics that are causally tied to long‑term value, not just correlated.

3.2.1 Designing causal proxy metrics

Work backwards from the thing leadership actually cares about:
  • Consumer: 6‑month retained buyers, subscription renewal, churn.
  • Marketplace: 90‑day driver/merchant retention, supply stability, profitability.
  • SaaS: expansion revenue, net dollar retention, seat growth.
For each, write the behavioral story:
“For a user to be a 6‑month retained buyer, what do they do in the first 1–4 weeks?”
You are looking for necessary early behaviors, not nice‑to‑have vanity metrics.
Examples:
  • 6‑month retention → returning in week 2, 3, 4; number of sessions; number of successful “core actions”.
  • LTV → early basket size, diversity of use cases (e.g., ordering multiple categories, using multiple surfaces).
  • Marketplace health → early earnings stability for supply, cancellation experiences for demand.
You want a mechanism, not just “this chart moves with that chart.”
Template:
Treatment → early behavioral change → structural change in relationship → long‑term outcome.
Concretely:
  • “New onboarding checklist → more users complete 3 key setup steps → more users reach ‘aha’ moment → higher 3‑month retention.”
  • “Better restaurant recommendations on home → more first orders from high‑quality merchants → better first‑week NPS → higher 90‑day order frequency.”
If the mechanism is fuzzy, your “proxy metric” is probably just a lagging vanity metric.
Good short‑term proxies usually have three properties:
  1. Proximal: They occur soon after treatment (days or weeks, not quarters).
  1. Mechanistic: They are directly on the causal path you described.
  1. Stable: Their relationship to the long‑term metric is reasonably stable across time and cohorts.
Concrete examples:
  • Onboarding / activation
    • Long‑term: 90‑day retained active users.
    • Proxy candidates:
      • “Completed 3 key actions within first 7 days.”
      • “Returned in week 2 and week 3.”
    • Why they work: The mechanism is “users who understand value and build habit early are much more likely to stay.”
  • Subscriptions / paywalls
    • Long‑term: 6‑month subscriber retention, ARPU.
    • Proxy candidates:
      • “Number of paywalled articles read in trial.”
      • “Days with any consumption in first 14 days.”
    • Why they work: These capture engaged subscribers vs. those who churn as soon as friction appears.
  • Marketplaces
    • Long‑term: sustainable supply, cross‑side retention.
    • Proxy candidates:
      • For drivers: “earned at least $X per online hour in first 2 weeks”, “fraction of sessions with unacceptable idle time.”
      • For customers: “experienced at most Y bad ETAs or cancellations in first N orders.”
    • Why they work: Income and experience early in the lifecycle shape whether participants stick around.
Bad proxies to push back on:
  • Clicks on a shiny surface with no clear path to habit or revenue.
  • Session length when the product is supposed to make users faster.
  • Email opens when the real value is downstream purchases.

3.2.2 Validating proxies empirically

You earn the right to use a proxy by establishing its predictive, causal relationship to the long‑term metric.
Tactics:
  1. Retrospective cohort analysis
      • Take past users, bin them by proxy value in early period (e.g., “0 core actions”, “1–2”, “3+”).
      • Plot long‑term outcomes by bin (retention, LTV, churn).
      • You want strong, monotonic differences that are robust to simple controls (cohort, region, device).
  1. Simple causal checks
      • Use observational tools (e.g., regression, matching) to see whether the proxy still predicts long‑term outcomes after controlling for obvious confounders like tenure, acquisition channel, geography.
      • You are not trying to publish a paper, just to check the relationship is not entirely spurious.
  1. “Proxy experiments”
      • Sometimes you can run experiments whose primary goal is to move the proxy, then check long‑term follow‑up for units in those experiments.
      • If interventions that reliably move the proxy also reliably move the long‑term metric in the expected direction (when measured months later), your confidence in that proxy increases.
Staff‑level behavior: you push for at least one of these to be done once per major product surface, then stabilize on a small set of “blessed” proxies.