Module 3: Verification

This chapter is written from the perspective of a Staff Data Scientist who has shipped real products and broken a few along the way. The goal is not to teach you how to click buttons in an A/B testing tool. It is to shape how you think: what to test, what not to test, and how to turn noisy data into confident product decisions.

If you treat experimentation as a box‑checking exercise, you will block velocity and still ship bad decisions. If you treat it as a rigorous but pragmatic decision engine, you become the person product and engineering call before they bet the roadmap.

In this Module

What ‘Good’ looks like at Staff From Diagnosis to Design Basic A/B Tests Hygiene Short‑Term vs Long‑Term Effects Global holdouts and aggregate long‑term impact Experimentation in Difficult Environments Governance, Pitfalls, and Organizational Learning

Staff‑level experiment review checklist

Use this when you are asked to sign off on a design or interpret results:

🔑

Is the problem real and important?

If the experiment "wins", will anyone actually care?

Are metrics and success criteria sharp?

Primary, secondary, and guardrails written down.

MDE grounded in economics, not vibes.

Is the design appropriate?

Randomization unit and exposure logic make sense.

Network/interference and marketplace effects considered.

Is power and duration realistic?

Enough traffic and time to hit the stated MDE.

Seasonality and external events accounted for.

Is the analysis plan sane?

Right tests, variance reduction, SRM checks.

Clear plan for multiple testing if relevant.

Are risk and rollback plans explicit?

Guardrails monitored in near real time.

Clear conditions for aborting or pausing.

Will we learn something even if the result is null?

If the only acceptable outcome is "treatment wins", this is not an experiment, it is a performance.

If you cannot answer these questions confidently, your job is not to "be helpful" and let the experiment proceed. Your job is to push back until the design is worthy of the traffic, users, and time it will consume.