Experimentation · Prioritisation · Strategy

Stop Guessing. Start Scoring.

How to build an A/B test prioritisation framework that removes subjectivity — and a free Notion template to do it in.

By Marion Ravel · 15 April 2026 · 8 min read

In this article

The problem with "this feels important"
What a non-subjective framework actually looks like
The RICE foundation
Breaking down Reach into 6 dimensions
Scoring Impact, Confidence, and Effort
How to use the framework in practice
Common mistakes (and how to avoid them)
Get the free Notion template

The problem with "this feels important"

Let me paint you a very familiar picture.

It's your weekly roadmap review. On the table: 14 A/B test ideas. The PM wants to test the homepage hero. Engineering says the checkout flow is too complex. The CEO saw something a competitor did and "it would be quick to replicate." The data analyst has a hypothesis backed by actual user research — but they're the quietest person in the room.

The data analyst watching the roadmap get decided by who talks loudest.

Two weeks later you're A/B testing button colours. The high-confidence, high-reach test that could move the needle? Pushed to Q3. Again.

This is what happens when prioritisation is subjective. The loudest opinion wins. The best ideas lose. And your roadmap becomes a political artefact rather than a strategic one.

The fix isn't a longer meeting. It's a framework that does the arguing for you.

What a non-subjective framework actually looks like

Let's be honest — you can't remove subjectivity entirely. Humans make the calls. But you can structure subjectivity so that:

Every test is scored on the same criteria
The criteria are agreed on in advance (not made up in the meeting)
Scores come from structured dropdowns, not blank text fields
The formula does the maths — no one gets to override it mid-presentation

"A good prioritisation framework doesn't remove human judgement. It redirects it — from 'which test do we pick?' to 'how do we rate this dimension?'"

That shift is everything. When the debate moves from "I feel like this is more impactful" to "should this be a 7 or an 8 on confidence?", you're suddenly having a useful conversation grounded in evidence.

The RICE foundation

The framework is built on RICE scoring — originally from Intercom — with one significant upgrade to the Reach dimension.

Priority Score = 1.5 × Reach × Impact × Confidence ÷ Effort

Each component gets a numerical score. Multiply them together (with a 1.5 weighting to emphasise reach), divide by effort, and you get a Priority Score. Sort descending. Done.

The 1.5 multiplier on Reach is intentional. Tests that affect more users should rank higher than niche optimisations, all else being equal. If your A/B test only affects one market on desktop with one traffic source — it should score lower than a test that touches your entire audience.

Everyone when they realise scope matters.

Breaking down Reach into 6 dimensions

Standard RICE treats Reach as a single number ("how many users per quarter?"). That's fine for product features. For A/B tests on a website, it's too blunt.

A test might reach 100% of users in theory — but only on mobile, in one market, on one page type, hidden below the fold. That's not the same reach as a test that's visible to everyone, everywhere, above the fold.

So instead, Reach is scored across 6 dimensions, then averaged:

Dimension	What it measures	Max pts
Market scope	All markets vs. one local market	10
Device scope	Both mobile + desktop vs. one device type	10
Journey / audience scope	Most users vs. a very specific segment	10
Page traffic level	Very high-traffic area vs. low-traffic area	10
Template / page coverage	All pages of that type vs. one isolated page	10
Visibility on page	Above the fold and prominent vs. hidden	10

Each dimension has a structured dropdown with clear options. Not a blank number field. Not a free text box. A dropdown where the label tells you exactly what score you get and why. Like this:

Option in the dropdown	Score
All markets	10
Top markets only	7
Few secondary / non-top markets	4
One local market only	1

You pick your option. The score calculates automatically. No debate needed.

Scoring Impact, Confidence, and Effort

Impact

Based on expected business impact within 30 days. Deliberately revenue-anchored to keep things concrete:

Option	Score
Massive impact — 7%+ RPU uplift or $50k+ incremental	10
High impact — 5–7% RPU or $25k–$50k	8
Moderate impact — 3–5% RPU or $10k–$25k	6
Small impact — 1.5–3% RPU or $5k–$10k	4
Minimal impact — 0.1–1.5% RPU or $1k–$5k	2

If you don't know your RPU targets, use the revenue ranges as anchors. The key is that "I think it'll be impactful" becomes "I think this is a 6 because it could move CVR by 3–5%." That's a sentence someone can challenge with data.

Confidence

How much evidence backs this hypothesis? This is where the research actually gets rewarded:

Option	Score
Quant + qual sources + prior successful A/B test	12
3+ valid sources including quant and qual	10
2 sources including quant and qual	8
1 valid source	6
Weak directional signal below threshold	4
Opinions only — no research	2

Notice the top score goes above 10. That's deliberate. Tests backed by prior successful A/B test results in the same area deserve to be elevated — they're not hypotheses, they're near-certainties.

"We have strong evidence for this." "Cool, is that based on actual data or the CEO's last trip to a competitor's website?"

Effort

Split into two parts — because building the test and shipping it permanently if it wins are very different things:

Test implementation effort (1–5 pts):

Very light build, CSS tweak only → 1
Light front-end test → 2
Moderate FE + QA → 3
Complex FE / feed / app logic → 4
Backend or major dependency → 5

Permanent rollout effort (1–5 pts):

Copy change only, minimal effort → 1
Limited coordination needed → 2
Multiple templates or markets → 3
Multi-team dependency → 4
Significant backend / legal / localisation → 5

Total effort = the two scores added together. Maximum possible effort score: 10. This gets used as the denominator in the priority formula — so high-effort tests get pulled down proportionally.

How to use the framework in practice

Write the hypothesis first

Don't score until you have a proper "if/then/because" hypothesis. No hypothesis = no test.

Fill the dropdowns

Use the structured dropdowns. Do not type raw numbers. The labels tell you exactly what to pick.

Let the score calculate

The framework does the maths. Resist the urge to override it. If you disagree, challenge the inputs — not the output.

Sort by Priority Score

Top scores go first. Review as a team quarterly. Add new tests at any time.

The Priority Tier (High / Medium / Lower) is a manual override — for cases where the score is right but context means the test should wait. Use it sparingly, and document why in the Notes field.

The rule of one challenge

When someone disagrees with a score, they must challenge a specific input — not the total. "I think the market scope should be 4, not 7, because this only affects users in EN markets" is a valid challenge.

"I just feel like this test is more important" is not a valid challenge.

This rule alone transforms the quality of your roadmap conversations.

Common mistakes (and how to avoid them)

Mistake 1: Treating the framework as optional

If some tests go through the scoring process and others don't, the framework loses its authority. Every test — no matter who suggested it — goes through the same process. Yes, including the one the CEO mentioned.

Me explaining why the CEO's idea scored a 34 and the analyst's scored a 192.

Mistake 2: Being too generous with Confidence

It's tempting to give every idea a confident score. Resist. If your evidence is "we heard some users mention it in a session six months ago," that's a 4, not an 8. Be honest. Low confidence scores are informative — they tell you where to do more research.

Mistake 3: Using effort as a tiebreaker rather than a genuine score

Low-effort tests are easy to execute but often low-impact. Don't let "it's quick to build" become a backdoor to the top of the backlog. If the reach, impact, and confidence scores are low, a small effort score won't save it.

Mistake 4: Never updating scores

A test that was scored 3 months ago might have different evidence behind it now. Review scores quarterly. If your data team found new evidence, update the confidence score. If the test page got a redesign that changed its traffic, update the reach score.

Mistake 5: Treating Priority Tier as a second vote

The Priority Tier field exists for genuine edge cases — a legal constraint, a seasonal dependency, a technical blocker. It is not there so stakeholders can manually elevate their favourite ideas after the scoring disagreed with them. If you find yourself changing tiers in every meeting, you've rebuilt the subjective process you were trying to replace.

The free Notion template

I've built the entire framework in Notion — every scoring dimension with a dropdown, every score with an auto-calculated formula, and the final Priority Score computing automatically from your inputs.

No formulas to set up. No spreadsheet to maintain. Just open it, duplicate it into your workspace, and start scoring.

Here's what's inside:

Structured dropdowns for all 6 Reach dimensions, Impact, Confidence, and Effort
Auto-calculated scores next to every dropdown — pick your option, the score appears
Reach Score, Total Effort, and Priority Score — all computed automatically
Priority Tier — colour-coded High / Medium / Lower for quick scanning
Status tracking — Backlog → Planned → Running → Completed
Start Date + End Date — so it doubles as a test calendar
Hypothesis, Problem Statement, KPI fields — because good tests start with good thinking

Me sharing the template. You get a framework! You get a framework!

Get the free Notion template

Comment "FRAMEWORK" on the LinkedIn post and I'll DM you the link directly. No email, no form, no catch.

Already commented? Check your DMs — I send them personally.

Closing thought

The hardest part of building a non-subjective framework isn't the scoring logic. It's getting everyone to agree to use it — and to actually trust it when it contradicts their instincts.

Start with one sprint. Score every test in the backlog using this framework. Don't change the order based on gut feel. See what happens. My guess is that the tests you'd have deprioritised under the old system will start outperforming.

Because it turns out, evidence is a pretty good predictor of outcomes. Who knew.

The team when the data analyst's hypothesis scores highest two sprints in a row.

If you try this and it helps — or you find something that doesn't work — I'd genuinely love to hear about it. Drop a comment below or reach out directly.

Good luck. May your backlog be forever sorted by Priority Score descending.

Stop guessing why users drop off

Discoveo connects your GA4 funnel data with user feedback to explain the why — and prioritise what to fix first.

Discover Discoveo →