CRO & Experimentation
RGM° · Training
Prioritization Frameworks
The gap between mature programs and rookies. ICE, PIE, PXL, impact estimation, confidence, effort, and the politics of prioritization.
Why prioritization separates programs
Most CRO programs have more test ideas than they can run. The difference between mature and rookie programs isn't the number of ideas — it's which ideas they choose to test. A program running well-prioritized tests on high-leverage pages produces 5–10× the lift of a program running on whatever was top-of-mind that week.
Major frameworks
| Framework | Inputs | Origin |
| ICE | Impact, Confidence, Ease (each 1–10) | Sean Ellis / growth hacking |
| PIE | Potential, Importance, Ease (each 1–10) | WiderFunnel |
| PXL | Multi-attribute checklist with weighting (15+ factors) | CXL (Peep Laja) |
ICE and PIE are similar — subjective 1–10 scores on three dimensions, multiplied for a score. PXL is more structured: instead of subjective scores, you check whether the hypothesis meets specific criteria (e.g., "Is the change above the fold?", "Is this change supported by user testing data?"). Each yes gets points; total score drives priority.
The right framework for your team
- Small team, fast cadence: ICE. Quick scoring; minimal overhead.
- Mid-size team, mature program: PIE or PXL.
- Enterprise with rigor: PXL or custom multi-attribute framework with weighted criteria.
PXL in depth
CXL's PXL framework asks a checklist for each hypothesis:
- Is the change above the fold?
- Is the change noticeable in 5 seconds?
- Does the change add or remove an element?
- Is the change on a high-traffic page?
- Is the test going to run on a page with at least 1,000 conversions/month?
- Does it address an insight from user research / heatmap / session recording?
- Does it address an insight from analytics?
- Does it address an insight from competitor analysis?
- Does it address an insight from customer interviews / qualitative data?
- Does the test mitigate a friction point users have complained about?
- Is the test on a page in the user's primary path to conversion?
- (Effort) Implementable in < 1 week?
- (Effort) Requires no engineering?
- (Effort) Requires no new content?
Higher score = higher priority. The discipline forces you to articulate why you think a test will work — not just "gut feeling."
Estimating impact
- Volume affected. How many users will see the change per month? Higher = more potential lift.
- Stage of funnel. Lift on cart-page tests propagates to revenue; lift on top-of-funnel can dilute downstream.
- Baseline metric value. A page with 1% conversion rate has more headroom than a page with 30%.
- Historical similar tests. What lifted similar pages in your test archive? In industry case studies?
- Sanity check magnitude. A "30% lift on checkout" hypothesis should be skeptically reviewed; lifts that large are rare.
Confidence: research-backed vs gut
| Evidence type | Confidence contribution |
| Analytics drop-off data showing the bottleneck | High |
| User testing identifying confusion or friction | High |
| Heatmap / session recording showing user behavior | Medium-high |
| Customer support tickets repeating the issue | Medium-high |
| Industry case study showing similar change worked | Medium |
| Competitor analysis showing differentiated pattern | Low-medium |
| Gut feeling / best practice intuition | Low |
Effort estimation
- Engineering hours. Server-side changes, new components, complex tracking.
- Design hours. New layouts, visual treatments, mockups.
- Content hours. Copy variants, asset production, translation.
- Ops hours. Sample size requirements (longer = more cycle time), QA effort, rollback planning.
- Coordination overhead. Multi-team approvals, stakeholder reviews, legal/compliance.
Be honest. Underestimating effort leads to blown timelines that affect program throughput.
Stakeholder politics
- Executive favorites. Tests "requested" by the CEO or VP often jump queue. Manage by building enough goodwill to push back; or run them in a side queue to satisfy without disrupting program.
- Vendor pitches. Tools and agencies pitch "guaranteed lift" tests. Apply the same prioritization rigor.
- HiPPO (Highest Paid Person's Opinion) bypass. Use data to defend prioritization decisions. Subjective opinions don't override quantified scores.
- Cross-team tests. Tests that affect product, brand, design simultaneously need cross-team prioritization, not unilateral CRO-team decisions.
Backlog cadence
- Weekly: Add new hypotheses; reorder based on emerging research.
- Monthly: Review backlog of next 10–15 tests; align on launch sequence.
- Quarterly: Strategic review — what areas are we under-investing? Where should research focus next quarter?
- Annually: Framework review — is our prioritization still working?
Advanced playbook
- Custom weighted PXL. Different organizations weight criteria differently. Customize PXL with stakeholder input.
- Impact estimation as monthly $. Translate scores to estimated monthly revenue impact. Forces realism and stakeholder buy-in.
- Confidence intervals on impact. Show pessimistic / realistic / optimistic scenarios.
- Effort estimation in T-shirt sizes. S/M/L/XL; avoids false precision.
- Test sequencing for area saturation. Don't test ten things on the same page in parallel; sequence so each test learns and informs the next.
- Theme-based quarterly focus. Quarter on checkout, next on landing pages, next on product pages. Builds depth.
- Reserved capacity for serendipity. 10–20% of test capacity for unprioritized but high-confidence opportunistic tests.
- Pre-mortems on top-3 priorities. What could go wrong with this test? Surface risks before launch.
- Post-test calibration. Compare predicted impact to actual; teach the prioritization model over time.
- Backlog freshness. Quarterly purge of stale hypotheses; reconsider against new research.
Common mistakes
- No framework at all; tests run on whoever shouts loudest.
- Subjective ICE scores from one person; no shared definition of 1–10.
- Impact estimates with no baseline math; numbers from feel.
- Confidence inflated by "best practices" without research backing.
- Effort underestimated; backlog blown.
- Executive overrides without recourse; HiPPO wins.
- Backlog never re-prioritized; stale hypotheses surface stale tests.
- Theme drift; tests scattered across funnel; no learning compound.
- No retroactive calibration; prioritization model never improves.
- Prioritizing easy tests over high-impact tests; quick wins without compound impact.
Operating checklist
- Documented prioritization framework (ICE, PIE, PXL, or custom)
- Hypotheses scored using shared criteria
- Impact estimates with sanity-check magnitudes
- Confidence backed by research evidence
- Effort estimated honestly
- Weekly backlog updates; monthly reorderings
- Quarterly strategic review of test focus
- HiPPO bypass policy documented
- Reserved capacity for serendipity
- Post-test calibration of prediction accuracy
- Backlog freshness purge quarterly
Sources and further reading
- CXL Institute — PXL framework documentation (Peep Laja)
- WiderFunnel PIE framework
- Sean Ellis — ICE framework origins
- Ron Kohavi — on the limits of HiPPO
- Andrew Anderson — CRO prioritization writings
- Optimizely and VWO knowledge base on prioritization
- Booking.com's public talks on experiment prioritization
- Statsig and Eppo product documentation
- Reforge growth and experimentation courses
- ConversionXL podcast prioritization episodes
- GrowthHackers community discussions
- Industry case study databases (CXL, WiderFunnel)
Part of the CRO & Experimentation series.