Short‑Term vs Long‑Term Effects

Most experiments can only observe short‑term behavior, but the business usually cares about long‑term outcomes: retention, LTV, marketplace health, brand trust. Staff‑level experimentation is about engineering the bridge between the two: choosing short‑term metrics that are causally tied to long‑term value, not just correlated.

3.2.1 Designing causal proxy metrics

Work backwards from the thing leadership actually cares about:

Consumer: 6‑month retained buyers, subscription renewal, churn.

Marketplace: 90‑day driver/merchant retention, supply stability, profitability.

SaaS: expansion revenue, net dollar retention, seat growth.

For each, write the behavioral story:

“For a user to be a 6‑month retained buyer, what do they do in the first 1–4 weeks?”

You are looking for necessary early behaviors, not nice‑to‑have vanity metrics.

Examples:

6‑month retention → returning in week 2, 3, 4; number of sessions; number of successful “core actions”.

LTV → early basket size, diversity of use cases (e.g., ordering multiple categories, using multiple surfaces).

Marketplace health → early earnings stability for supply, cancellation experiences for demand.

You want a mechanism, not just “this chart moves with that chart.”

Template:

Treatment → early behavioral change → structural change in relationship → long‑term outcome.

Concretely:

“New onboarding checklist → more users complete 3 key setup steps → more users reach ‘aha’ moment → higher 3‑month retention.”

“Better restaurant recommendations on home → more first orders from high‑quality merchants → better first‑week NPS → higher 90‑day order frequency.”

If the mechanism is fuzzy, your “proxy metric” is probably just a lagging vanity metric.

Good short‑term proxies usually have three properties:

Proximal: They occur soon after treatment (days or weeks, not quarters).

Mechanistic: They are directly on the causal path you described.

Stable: Their relationship to the long‑term metric is reasonably stable across time and cohorts.

Concrete examples:

Onboarding / activation

Long‑term: 90‑day retained active users.
Proxy candidates:

“Completed 3 key actions within first 7 days.”
“Returned in week 2 and week 3.”

Why they work: The mechanism is “users who understand value and build habit early are much more likely to stay.”

Subscriptions / paywalls

Long‑term: 6‑month subscriber retention, ARPU.
Proxy candidates:

“Number of paywalled articles read in trial.”
“Days with any consumption in first 14 days.”

Why they work: These capture engaged subscribers vs. those who churn as soon as friction appears.

Marketplaces

Long‑term: sustainable supply, cross‑side retention.
Proxy candidates:

For drivers: “earned at least $X per online hour in first 2 weeks”, “fraction of sessions with unacceptable idle time.”
For customers: “experienced at most Y bad ETAs or cancellations in first N orders.”

Why they work: Income and experience early in the lifecycle shape whether participants stick around.

Bad proxies to push back on:

Clicks on a shiny surface with no clear path to habit or revenue.

Session length when the product is supposed to make users faster.

Email opens when the real value is downstream purchases.

3.2.2 Validating proxies empirically

You earn the right to use a proxy by establishing its predictive, causal relationship to the long‑term metric.

Tactics:

Retrospective cohort analysis

Take past users, bin them by proxy value in early period (e.g., “0 core actions”, “1–2”, “3+”).

Plot long‑term outcomes by bin (retention, LTV, churn).

You want strong, monotonic differences that are robust to simple controls (cohort, region, device).

Simple causal checks

Use observational tools (e.g., regression, matching) to see whether the proxy still predicts long‑term outcomes after controlling for obvious confounders like tenure, acquisition channel, geography.

You are not trying to publish a paper, just to check the relationship is not entirely spurious.

“Proxy experiments”

Sometimes you can run experiments whose primary goal is to move the proxy, then check long‑term follow‑up for units in those experiments.

If interventions that reliably move the proxy also reliably move the long‑term metric in the expected direction (when measured months later), your confidence in that proxy increases.

Staff‑level behavior: you push for at least one of these to be done once per major product surface, then stabilize on a small set of “blessed” proxies.