Global holdouts and aggregate long‑term impact

So far we have talked about per‑experiment metrics and proxies. At Staff level you also need a way to answer a different question:

"Given everything we are shipping through experiments, what is the aggregate long‑term impact on the business?"

You do that by running a global holdout (also called a persistent control, platform holdout, or product‑level holdout).

What a global holdout is

You carve out a small, randomly selected fraction of traffic or users (for example, 1–5%).

That group is excluded from all product experiments and feature rollouts governed by your experimentation platform.

They experience the "old world" baseline, while the rest of the population sees the evolving stack of experiment winners and launches.

This lets you compare:

Treatment world: all users subject to your usual experimentation and rollouts.

Holdout world: users frozen at a stable product baseline.

Over months, the gap between these two gives you an all‑in estimate of how much your experimentation and product development are really moving long‑term outcomes.

How you use it

Typical long‑term outcomes to track on global holdout vs. treatment:

Retention curves (30/90/180‑day retained users).

Revenue / GMV / margin per user.

Engagement and habit metrics.

Marketplace health metrics (supply retention, earnings stability, cancellation rates).

Interpretation patterns:

If experiment‑level wins look great but the global holdout gap is flat, your pipeline is probably over‑fitting to local metrics, proxies, or noise.

If the global holdout shows strong uplift while individual experiments look modest, your portfolio is working as intended and compounding.

Staff‑level expectation: you regularly bring global holdout readouts into roadmap and strategy reviews so leadership sees portfolio‑level ROI, not just isolated wins.

Design and governance choices

Key decisions you are expected to influence:

Size: small enough to be politically acceptable, big enough to get a clean read. Many orgs live in the 1–5% range depending on traffic.

Scope: which products, platforms, or markets are covered. You might run separate holdouts by geo or platform when behavior differs a lot.

Duration: global holdouts are usually always‑on, with periodic deep‑dive reads (quarterly, half‑yearly).

Eligibility rules: who is in or out (for example, you may exclude VIP accounts, regulated segments, or certain enterprise contracts).

Your job is not just to "flip it on" but to make sure:

The holdout assignment is stable over time (users do not churn in and out).

The definition of "baseline" is versioned and documented so reads remain interpretable.

Teams understand that no experiments are allowed to sneak into the holdout.

Relationship to per‑experiment analysis

A global holdout does not replace individual experiments. Instead, it:

Acts as a sanity check: "Does the sum of our supposed wins show up against a clean external baseline?"

Catches interactions between experiments and features that are hard to see locally (for example, two "wins" that cannibalize each other).

Helps calibrate your proxy metrics: if the portfolio is "winning" on proxies but the global holdout shows no long‑term gain, your proxy framework needs work.

As a Staff‑level DS, you should be able to talk about both layers:

The micro‑view: "This experiment moved our activation proxy by +5%, which we believe maps to +2–3 points on 90‑day retention."

The macro‑view: "Over the last 2 quarters, the treatment world is +X% vs. the global holdout on 6‑month retained users and +Y% on contribution margin, so our experimentation program as a whole is paying off."