CRO & Experimentation
RGM° · Training
Experimentation Fundamentals
CRO is one of marketing's few places compound returns hide in plain sight. Mindset, funnel, test types, tools, research, and the program discipline.
What you will learn
- Why CRO is the only place compound returns hide in plain sight
- The experimentation mindset vs the optimization mindset
- The conversion funnel as the canvas
- Test types: A/B, A/B/n, multivariate, multipage, server-side
- Tools: Optimizely, VWO, Convert, AB Tasty, Statsig, Eppo, GrowthBook, in-house
- What to test: conversion bottlenecks, hypothesis generation
- Qualitative and quantitative research that fuels tests
- Building a program, not running tests
- Advanced playbook
- Common mistakes
- Operating checklist
Why CRO compounds
CRO is one of marketing's few endeavors with compound returns. Improve checkout conversion 1%; every future order benefits. Improve cart-page completion 2%; every future cart benefits. These aren't one-time wins — they're permanent step-changes that affect every subsequent transaction.
The math is brutal in the other direction too. A pricing-page conversion rate of 4% vs 5% (one-point gap) compounds into 20%+ revenue difference over a year of the same traffic. Most teams ignore this because the work feels less glamorous than acquisition. The teams that don't ignore it gain a structural moat.
Experimentation vs optimization
Optimization is changing things based on best guess. Experimentation is changing things based on rigorous evidence. The first feels productive; the second is productive.
- Optimization without experimentation accumulates beliefs that may not match reality. "We know our users hate carousels" — have you tested?
- Experimentation surfaces surprises. The most-tested teams report that 70–80% of their hypotheses are wrong or inconclusive. That's information, not failure.
- Experimentation builds shared truth across teams. Marketing, product, design, and engineering all see the same test results, ending opinion-driven arguments.
The funnel as canvas
| Funnel stage | Common tests |
| Landing pages | Headline, value prop, hero image, social proof, CTA copy, page length, form length |
| Product pages | Image gallery, pricing display, reviews placement, buy-button copy, size/variant selectors, recommended products |
| Category pages | Filter exposure, sort options, badges, product card design, infinite scroll vs pagination |
| Cart | Free shipping threshold messaging, upsell modules, urgency, abandoned-cart prevention |
| Checkout | Guest checkout vs forced account, payment options, address autofill, single page vs multi-step |
| Forms (lead gen) | Number of fields, field order, progressive disclosure, conditional logic, social proof on form |
| Pricing pages (SaaS) | Plan structure, feature comparison, FAQ, social proof, billing toggle |
| Onboarding (SaaS) | Welcome modal, sample data, time-to-first-value, activation events |
Test types
- A/B (split test). One variant vs control. The default and clearest.
- A/B/n. Multiple variants vs control. Requires more traffic for statistical power.
- Multivariate (MVT). Multiple changes tested together to isolate interactions. Heavy traffic requirement; rarely worth it vs sequential A/B tests.
- Multi-page / journey. Same variant across multiple pages (landing → product → cart). Catches downstream impact.
- Server-side. Tests on server-rendered or app-level changes. Necessary for product features beyond superficial UI.
- Feature flags. Gradual rollouts, holdouts for long-term effect measurement, kill switches.
- Bandit tests. Multi-armed bandits adapt traffic to better-performing variants. Better for exploitation; worse for inference.
| Tool | Strengths |
| Optimizely Web | Mature visual editor, robust statistics, segmentation; enterprise pricing |
| VWO | Visual editor + heatmaps + session recording; mid-market friendly |
| Convert | Privacy-conscious; GDPR-friendly; mid-market |
| AB Tasty | Visual + server-side; enterprise |
| Statsig, Eppo, GrowthBook | Modern experimentation platforms; engineering-first; warehouse-native |
| Optimizely Feature Experimentation | Server-side feature flags + experimentation |
| LaunchDarkly + custom | Feature flags with custom analytics |
| Google Optimize | Sunset 2023; clients migrated to alternatives |
| In-house | For organizations with engineering capacity; warehouse-native analytics; custom stats |
What to test
The mistake new programs make: test what's easy. The discipline: test what affects conversion most.
- Identify funnel bottlenecks. Where do the most users drop? That's where small lift creates large revenue impact.
- Prioritize high-traffic pages. Tests on low-traffic pages can't reach statistical power in reasonable time.
- Focus on hypotheses with strong reasoning. "Add social proof above the fold because user research showed trust concerns" beats "try a red button."
- Test value-proposition language. Headlines, lede copy, key benefits — often the highest-leverage tests on landing pages.
- Test CTA copy and placement. Among the most-replicated significant tests across industries.
- Test friction reduction. Form fields, required signups, payment options.
- Test trust signals. Reviews, ratings, security badges, money-back guarantees.
- Avoid trivial tests. Button color tests rarely produce meaningful lift; they consume traffic and team energy that could go to better hypotheses.
Research that fuels tests
Quantitative
- Analytics funnels showing drop-off patterns.
- Heatmaps (Hotjar, Microsoft Clarity, FullStory) showing click/scroll/hover behavior.
- Session recordings revealing friction patterns.
- Form analytics (Hotjar, Mouseflow) showing where users abandon forms.
- Cohort analysis for retention/engagement changes.
Qualitative
- User interviews on purchase intent and barriers.
- Customer support tickets revealing repeat friction.
- Live chat transcripts surfacing common questions and confusions.
- Exit-intent surveys catching users about to leave.
- User testing platforms (UserTesting, UserZoom) for prototyping.
- Card sorts, tree tests for IA decisions.
Building a program
- Designated ownership. A CRO lead, growth team, or product manager owns the program. Without owner, it doesn't happen.
- Testing cadence. Mature programs run 3–15 tests/month. Volume matters because most hypotheses fail; iteration count is what drives lift.
- Backlog management. Prioritized list of hypotheses with predicted impact. Updated quarterly.
- Cross-functional involvement. Marketing, product, design, engineering, analytics. Tests succeed because of collaboration, not despite it.
- Results archive. Document every test, win or loss, with hypothesis, design, results, and lessons. Institutional memory prevents re-running tests.
- Regular reviews. Monthly or quarterly program reviews. What worked? What didn't? What's next?
Advanced playbook
- Hypothesis library. Maintain a database of tested hypotheses across industries. Reduces "reinventing the wheel" for common patterns.
- Test stack discipline. Sequential tests on the same area rather than parallel competing tests; cleaner attribution.
- Sample size pre-planning. Calculate required sample size before launch; commit to running for that duration regardless of intermediate results.
- Holdouts for long-term effect. 5–10% holdout audience excluded from winning variant; measure long-term lift over months.
- Cross-segment analysis. Don't just look at overall lift; check by mobile vs desktop, new vs returning, paid vs organic. Different segments respond differently.
- Test sequencing for compound wins. Win, then test additional improvements on the winning variant. Compounds gains.
- Server-side ownership. Critical commerce-affecting tests run server-side, not client-side, for performance and accuracy.
- Personalization beyond testing. Once you have winners, personalization layers (rule-based or ML) extend results to relevant audiences.
- CRO program reporting to executives. Cumulative impact dashboard; revenue attributable to CRO program; team and budget justification.
- Vendor or in-house decision. Vendor tools work fine until scale demands warehouse-native; transition planning matters.
Common mistakes
- Testing trivial changes (button colors) and reporting wins as program impact.
- Stopping tests as soon as one variant looks ahead; ignoring sample size requirements.
- Testing without hypotheses; can't learn from results.
- One-and-done tests; no iteration on winning variants.
- No qualitative research informing test ideas; testing in the dark.
- Different teams running uncoordinated tests on the same pages; results contaminate.
- Trusting tool-reported "significance" without understanding underlying statistics.
- No test archive; institutional knowledge walks out the door with team turnover.
- Treating winning variants as universally winning; cross-segment behavior ignored.
- Vendor tools without engineering buy-in; server-side tests blocked.
- No long-term holdout; novelty effects mistaken for permanent lift.
- Testing on too-low-traffic pages; underpowered tests waste cycles.
Operating checklist
- Designated CRO program owner
- Test backlog with prioritized hypotheses
- Funnel research informing high-impact test areas
- Sample size pre-planning before launch
- 3–15 tests per month cadence for mature programs
- Cross-functional collaboration (marketing, product, design, eng, analytics)
- Test archive with hypotheses, designs, results, lessons
- Long-term holdouts for novelty-effect detection
- Cross-segment analysis on test results
- Iteration on winning variants for compound gains
- Quarterly program review with executive reporting
- Vendor tools or in-house platform aligned with engineering
Sources and further reading
- ConversionXL (now CXL) — CRO research and case studies
- Andrew Anderson — experimentation programs at scale
- Ron Kohavi, "Trustworthy Online Controlled Experiments" (textbook)
- Booking.com's engineering blog on experimentation
- Microsoft's research on online experiments
- Optimizely, VWO, Convert, AB Tasty — tool documentation
- Statsig, Eppo, GrowthBook — modern experimentation platforms
- GoodUI.org — UX patterns with test data
- Baymard Institute — ecommerce UX research
- NN/g (Nielsen Norman Group) — UX research methodology
- Andrew Chen — growth and experimentation strategy
- Reforge experimentation programs
Part of the CRO & Experimentation series.