← Back to all articles

Experimentation · Prioritisation · Strategy

Stop Guessing. Start Scoring.

How to build an A/B test prioritisation framework that removes subjectivity — and a free Notion template to do it in.

In this article

  1. The problem with "this feels important"
  2. What a non-subjective framework actually looks like
  3. The RICE foundation
  4. Breaking down Reach into 6 dimensions
  5. Scoring Impact, Confidence, and Effort
  6. How to use the framework in practice
  7. Common mistakes (and how to avoid them)
  8. Get the free Notion template

The problem with "this feels important"

Let me paint you a very familiar picture.

It's your weekly roadmap review. On the table: 14 A/B test ideas. The PM wants to test the homepage hero. Engineering says the checkout flow is too complex. The CEO saw something a competitor did and "it would be quick to replicate." The data analyst has a hypothesis backed by actual user research — but they're the quietest person in the room.

Confused maths lady meme

The data analyst watching the roadmap get decided by who talks loudest.

Two weeks later you're A/B testing button colours. The high-confidence, high-reach test that could move the needle? Pushed to Q3. Again.

This is what happens when prioritisation is subjective. The loudest opinion wins. The best ideas lose. And your roadmap becomes a political artefact rather than a strategic one.

The fix isn't a longer meeting. It's a framework that does the arguing for you.

What a non-subjective framework actually looks like

Let's be honest — you can't remove subjectivity entirely. Humans make the calls. But you can structure subjectivity so that:

"A good prioritisation framework doesn't remove human judgement. It redirects it — from 'which test do we pick?' to 'how do we rate this dimension?'"

That shift is everything. When the debate moves from "I feel like this is more impactful" to "should this be a 7 or an 8 on confidence?", you're suddenly having a useful conversation grounded in evidence.

The RICE foundation

The framework is built on RICE scoring — originally from Intercom — with one significant upgrade to the Reach dimension.

Priority Score = 1.5 × Reach × Impact × Confidence ÷ Effort

Each component gets a numerical score. Multiply them together (with a 1.5 weighting to emphasise reach), divide by effort, and you get a Priority Score. Sort descending. Done.

The 1.5 multiplier on Reach is intentional. Tests that affect more users should rank higher than niche optimisations, all else being equal. If your A/B test only affects one market on desktop with one traffic source — it should score lower than a test that touches your entire audience.

Nodding in agreement

Everyone when they realise scope matters.

Breaking down Reach into 6 dimensions

Standard RICE treats Reach as a single number ("how many users per quarter?"). That's fine for product features. For A/B tests on a website, it's too blunt.

A test might reach 100% of users in theory — but only on mobile, in one market, on one page type, hidden below the fold. That's not the same reach as a test that's visible to everyone, everywhere, above the fold.

So instead, Reach is scored across 6 dimensions, then averaged:

Dimension What it measures Max pts
Market scope All markets vs. one local market 10
Device scope Both mobile + desktop vs. one device type 10
Journey / audience scope Most users vs. a very specific segment 10
Page traffic level Very high-traffic area vs. low-traffic area 10
Template / page coverage All pages of that type vs. one isolated page 10
Visibility on page Above the fold and prominent vs. hidden 10

Each dimension has a structured dropdown with clear options. Not a blank number field. Not a free text box. A dropdown where the label tells you exactly what score you get and why. Like this:

Option in the dropdownScore
All markets10
Top markets only7
Few secondary / non-top markets4
One local market only1

You pick your option. The score calculates automatically. No debate needed.

Scoring Impact, Confidence, and Effort

Impact

Based on expected business impact within 30 days. Deliberately revenue-anchored to keep things concrete:

OptionScore
Massive impact — 7%+ RPU uplift or $50k+ incremental10
High impact — 5–7% RPU or $25k–$50k8
Moderate impact — 3–5% RPU or $10k–$25k6
Small impact — 1.5–3% RPU or $5k–$10k4
Minimal impact — 0.1–1.5% RPU or $1k–$5k2

If you don't know your RPU targets, use the revenue ranges as anchors. The key is that "I think it'll be impactful" becomes "I think this is a 6 because it could move CVR by 3–5%." That's a sentence someone can challenge with data.

Confidence

How much evidence backs this hypothesis? This is where the research actually gets rewarded:

OptionScore
Quant + qual sources + prior successful A/B test12
3+ valid sources including quant and qual10
2 sources including quant and qual8
1 valid source6
Weak directional signal below threshold4
Opinions only — no research2

Notice the top score goes above 10. That's deliberate. Tests backed by prior successful A/B test results in the same area deserve to be elevated — they're not hypotheses, they're near-certainties.

This is fine everything is fine meme

"We have strong evidence for this." "Cool, is that based on actual data or the CEO's last trip to a competitor's website?"

Effort

Split into two parts — because building the test and shipping it permanently if it wins are very different things:

Test implementation effort (1–5 pts):

Permanent rollout effort (1–5 pts):

Total effort = the two scores added together. Maximum possible effort score: 10. This gets used as the denominator in the priority formula — so high-effort tests get pulled down proportionally.

How to use the framework in practice

01

Write the hypothesis first

Don't score until you have a proper "if/then/because" hypothesis. No hypothesis = no test.

02

Fill the dropdowns

Use the structured dropdowns. Do not type raw numbers. The labels tell you exactly what to pick.

03

Let the score calculate

The framework does the maths. Resist the urge to override it. If you disagree, challenge the inputs — not the output.

04

Sort by Priority Score

Top scores go first. Review as a team quarterly. Add new tests at any time.

The Priority Tier (High / Medium / Lower) is a manual override — for cases where the score is right but context means the test should wait. Use it sparingly, and document why in the Notes field.

The rule of one challenge

When someone disagrees with a score, they must challenge a specific input — not the total. "I think the market scope should be 4, not 7, because this only affects users in EN markets" is a valid challenge.

"I just feel like this test is more important" is not a valid challenge.

This rule alone transforms the quality of your roadmap conversations.

Common mistakes (and how to avoid them)

Mistake 1: Treating the framework as optional

If some tests go through the scoring process and others don't, the framework loses its authority. Every test — no matter who suggested it — goes through the same process. Yes, including the one the CEO mentioned.

Shrug meme

Me explaining why the CEO's idea scored a 34 and the analyst's scored a 192.

Mistake 2: Being too generous with Confidence

It's tempting to give every idea a confident score. Resist. If your evidence is "we heard some users mention it in a session six months ago," that's a 4, not an 8. Be honest. Low confidence scores are informative — they tell you where to do more research.

Mistake 3: Using effort as a tiebreaker rather than a genuine score

Low-effort tests are easy to execute but often low-impact. Don't let "it's quick to build" become a backdoor to the top of the backlog. If the reach, impact, and confidence scores are low, a small effort score won't save it.

Mistake 4: Never updating scores

A test that was scored 3 months ago might have different evidence behind it now. Review scores quarterly. If your data team found new evidence, update the confidence score. If the test page got a redesign that changed its traffic, update the reach score.

Mistake 5: Treating Priority Tier as a second vote

The Priority Tier field exists for genuine edge cases — a legal constraint, a seasonal dependency, a technical blocker. It is not there so stakeholders can manually elevate their favourite ideas after the scoring disagreed with them. If you find yourself changing tiers in every meeting, you've rebuilt the subjective process you were trying to replace.

The free Notion template

I've built the entire framework in Notion — every scoring dimension with a dropdown, every score with an auto-calculated formula, and the final Priority Score computing automatically from your inputs.

No formulas to set up. No spreadsheet to maintain. Just open it, duplicate it into your workspace, and start scoring.

Here's what's inside:

Oprah you get a car

Me sharing the template. You get a framework! You get a framework!

Get the free Notion template

Comment "FRAMEWORK" on the LinkedIn post and I'll DM you the link directly. No email, no form, no catch.

Already commented? Check your DMs — I send them personally.


Closing thought

The hardest part of building a non-subjective framework isn't the scoring logic. It's getting everyone to agree to use it — and to actually trust it when it contradicts their instincts.

Start with one sprint. Score every test in the backlog using this framework. Don't change the order based on gut feel. See what happens. My guess is that the tests you'd have deprioritised under the old system will start outperforming.

Because it turns out, evidence is a pretty good predictor of outcomes. Who knew.

Mind blown

The team when the data analyst's hypothesis scores highest two sprints in a row.

If you try this and it helps — or you find something that doesn't work — I'd genuinely love to hear about it. Drop a comment below or reach out directly.

Good luck. May your backlog be forever sorted by Priority Score descending.


Stop guessing why users drop off

Discoveo connects your GA4 funnel data with user feedback to explain the why — and prioritise what to fix first.

Discover Discoveo →