Measurement & Attribution
How geo holdouts, PSA tests, matched markets, and ghost ads let you measure the actual incremental lift of a marketing channel — separating causation from correlation. The methods, the design considerations, and how to fit them into a measurement program.
Most marketing attribution — last click, MTA, even MMM — is correlational. It tells you which channels were present when conversions happened. It doesn't tell you which channels caused the conversions.
Incrementality testing is the only methodology that produces causal estimates. The test design isolates the marketing intervention so that the difference between exposed and control groups is, with high probability, the actual incremental lift the marketing produced.
Split your geographic markets into a treatment group (gets the marketing) and a control group (doesn't). Compare outcomes. If treatment markets see materially higher conversions, the difference is incremental lift.
Best for: paid media campaigns where you can target geographically. Requires enough markets to power the test (usually 20+ on each side).
The control group sees a PSA ad in the same placements where the treatment group sees your brand ad. Both groups had the ad slot occupied (matched exposure); only the treatment group saw your message. Difference in conversions = incremental lift.
Best for: programmatic display, video, and CTV where you can deliver alternative creative to a matched control. Most platform-native experimentation tools support this design.
For smaller-scale tests where full geo split isn't practical: pick pairs of similar markets (matched on size, demographics, prior performance). Treat one in each pair; hold the other as control. Compare. Less statistically powerful than full geo splits but workable at smaller scale.
Platform-native A/B tests (Meta Conversion Lift, Google Conversion Lift) randomly assign users to a treatment cell (eligible to see ads) and a control cell (not eligible). Platforms suppress ads in the control. Difference in conversions is the lift.
Best for: when you're already heavily invested in a platform and want to measure that platform's contribution. Limitation: you have to trust the platform's implementation.
Same market alternates between treatment-on and treatment-off periods. Less statistically clean than parallel-group designs but useful for marketplaces or operations where holding back creates business issues.
Power. The test needs to be large enough to detect the effect size you care about. Small lifts in small markets need large samples to be statistically detectable.
Duration. Most marketing has lagged effects. A 2-week test for a brand campaign is too short. 6–12 weeks is more reasonable for upper-funnel work.
Contamination. If treatment-market exposure spills into control markets (via national digital media, word-of-mouth across markets, etc.), the test underestimates true lift.
Pre-period validation. Compare treatment and control performance during a pre-period when both should behave the same. If they don't, the markets aren't actually matched.
A mature measurement program runs incrementality tests on a quarterly cadence — typically 2–4 tests per year on the channels and questions that matter most. Bigger budgets and higher-stakes decisions warrant more tests.