Connecting experimentation to the metric tree framework

How to run an A/B test with metric trees

Most A/B tests are designed in isolation, optimising a single metric without understanding how it connects to the broader business. A metric tree changes that. It gives you a structured way to choose what to test, select the right success metrics, and measure impact across the full chain of cause and effect.

8 min read

Generate AI summary

Why experiments need metric trees

A/B testing is one of the most powerful tools available to product and growth teams. It replaces opinion with evidence and lets you make decisions with measurable confidence. Yet most experimentation programmes struggle not because they lack statistical rigour, but because they lack strategic context. Teams run dozens of tests, declare winners, and ship changes, but the cumulative effect on the business is disappointingly small. The reason is simple: the tests were not connected to the outcomes that matter most.

The root of the problem is metric selection. Every A/B test requires a primary metric, the number you are trying to move, and one or more guardrail metrics, the numbers you are trying not to damage. In most organisations, these choices are made locally by the team running the test. The product team picks feature adoption. The growth team picks sign-up conversion. The marketing team picks click-through rate. Each choice is reasonable in isolation, but without a shared model of how these metrics relate to each other and to the business outcome, there is no way to know whether a win on one metric creates a loss somewhere else in the system.

A metric tree provides that shared model. It shows the full causal chain from the North Star metric at the top down to the operational levers at the bottom. When you design an experiment within this structure, you know exactly where in the tree your primary metric sits, which parent metrics it feeds, which sibling metrics might be affected, and which guardrail metrics you need to protect. The tree does not replace statistical methodology. It provides the strategic scaffolding that tells you which experiments are worth running in the first place.

Key insight

An A/B test without a metric tree tells you whether a change worked. An A/B test with a metric tree tells you whether a change worked, why it worked, and what else it affected across the business.

How a metric tree helps you pick what to test

One of the biggest challenges in experimentation is prioritisation. Most product teams have a backlog of test ideas that far exceeds their capacity to run them. Without a framework for ranking those ideas, prioritisation defaults to gut feeling, seniority, or whichever idea was suggested most recently. The metric tree provides a more disciplined alternative.

Start at the top of the tree and identify which branch of the business has the most room for improvement. If your North Star is revenue and it decomposes into new customer revenue and existing customer revenue, compare the two. If retention is healthy but acquisition is lagging, the acquisition branch is where experiments will have the most leverage. Drill down further: within acquisition, is the problem traffic volume, conversion rate, or activation? The tree narrows the search from "we could test anything" to "this specific node is underperforming and sits on the critical path to our goal."

The tree above illustrates a common SaaS revenue structure. Suppose analysis shows that trial starts are healthy but trial-to-paid conversion is below benchmark. The tree immediately focuses your experimentation roadmap on the nodes beneath that branch: onboarding completion and time to value. You do not need a brainstorming session to generate test ideas. The tree has already told you where the leverage is. Your job is to form hypotheses about why that node is underperforming and design experiments to test those hypotheses.

This approach also prevents a common failure mode: running experiments on metrics that are already performing well. Optimising a sign-up page that already converts at 12% when your trial-to-paid rate is 8% is a poor use of experimentation capacity. The tree makes these trade-offs visible by showing the relative performance and sensitivity of every node in the system.

Designing experiments that target specific tree nodes

Once you have identified the node you want to improve, the metric tree shapes how you design the experiment. The primary metric is the node itself. The guardrail metrics are its siblings and its parent. This structure ensures you are not just moving one number in isolation, but doing so in a way that lifts the parent without damaging related branches.

Consider an experiment aimed at improving onboarding completion rate. In the tree, onboarding completion sits beneath trial-to-paid conversion, which sits beneath new MRR. The primary metric for the test is onboarding completion rate. The guardrail metrics should include trial-to-paid conversion (to confirm the improvement translates upward), time to value (the sibling metric, to ensure you are not just getting people through onboarding faster without them actually reaching the value moment), and support ticket volume during onboarding (to ensure the new flow does not create confusion that surfaces elsewhere).

1
Identify the target node
Use the metric tree to find the specific node your experiment aims to improve. This becomes your primary metric. Be precise: "improve onboarding completion from 62% to 68%" is far more useful than "improve the onboarding experience." The tree forces precision because every node has a measurable definition.
2
Define guardrail metrics from the tree structure
Look at the target node's parent, its siblings, and any adjacent branches. These are your guardrails. The parent tells you whether the improvement propagates upward. The siblings tell you whether you are robbing Peter to pay Paul. Adjacent branches catch unintended side effects. A metric tree makes guardrail selection systematic rather than ad hoc.
3
Set the minimum detectable effect using tree sensitivity
The tree lets you model how a change at one node affects its parent. If onboarding completion improves by 5 percentage points, what does that mean for trial-to-paid conversion? What does that mean for new MRR? Work the arithmetic upward through the tree to confirm the expected business impact justifies the cost of running the test.
4
Size the experiment for both primary and guardrail metrics
Most teams size their experiments only for the primary metric. But if your guardrail is trial-to-paid conversion and you need to detect a 2% degradation, you may need a larger sample than the primary metric alone requires. Use the tree to identify which guardrail is hardest to detect and size accordingly.
5
Document the hypothesis using tree language
Frame the hypothesis in terms of the tree: "By simplifying step three of onboarding, we expect onboarding completion rate to increase by 5pp, which will improve trial-to-paid conversion by approximately 2pp, contributing an estimated increase to new MRR." This connects the experiment to business outcomes and makes the results interpretable by anyone who understands the tree.
6
Run, measure, and trace the impact through the tree
After the experiment concludes, do not just check whether the primary metric moved. Trace the impact upward through the tree. Did the parent metric improve as expected? Were any guardrails breached? Did any unexpected nodes move? The tree gives you a structured post-experiment analysis that goes far beyond a simple win/loss verdict.

Measuring experiment impact across the tree

The most common mistake in A/B testing is measuring only the target metric. A test that improves sign-up conversion by 8% sounds like a clear win, until you discover that the new sign-up flow attracts lower-quality users who churn at twice the normal rate. The target metric moved in the right direction, but the net effect on the business was negative. This kind of failure is invisible without a metric tree because you have no structure telling you which other metrics to check.

A metric tree turns experiment analysis into a systematic exercise. When the test concludes, you examine every node on the path from the target metric to the North Star. You also examine sibling nodes at each level to check for trade-offs. The tree acts as a checklist that ensures no important metric is overlooked. This is particularly valuable for experiments with delayed effects. A change to onboarding might immediately improve completion rate (measurable in days) but take weeks to show its effect on retention (measurable in months). The tree tells you which downstream metrics to monitor and over what time horizon.

There is also a compounding effect that the tree makes visible. Most experiments produce small improvements: 2% here, 3% there. In isolation, these numbers feel modest. But when you can see them in the tree, you can model how they compound. A 3% improvement in onboarding completion, combined with a 2% improvement in time to value, might produce a 4.5% improvement in trial-to-paid conversion, which feeds into a meaningful lift in new MRR. The tree lets you quantify the cumulative impact of your experimentation programme, which is essential for justifying continued investment in it.

Analysis approach	Without a metric tree	With a metric tree
Primary metric	Check if the target metric improved	Check if the target metric improved and confirm the improvement propagates to its parent
Guardrail metrics	Check one or two metrics chosen by intuition	Systematically check all siblings and parent nodes defined by the tree structure
Side effects	Discovered weeks later when another team notices a problem	Proactively checked by examining adjacent branches during analysis
Business impact	Reported as a percentage change on the target metric	Modelled upward through the tree to estimate impact on the North Star
Learning	Binary: the test won or lost	Rich: which nodes moved, which did not, and what that reveals about causal assumptions in the tree
Cumulative tracking	A spreadsheet of individual test results	A tree-level view showing total experimentation impact on each node over time

The table above highlights the difference between experiment analysis with and without a metric tree. The key shift is from isolated measurement to systemic measurement. When every experiment is analysed in the context of the tree, you build a far deeper understanding of how your business works. Each test is not just a decision about whether to ship a change. It is a data point that validates or challenges the causal assumptions encoded in your tree. Over time, this creates an organisational knowledge base about which levers actually move which outcomes, and by how much.

Common A/B testing pitfalls the tree helps avoid

A/B testing has well-documented pitfalls, from peeking at results too early to running underpowered tests. But some of the most damaging pitfalls are strategic rather than statistical, and these are precisely the ones a metric tree is designed to prevent. Below are five pitfalls that plague experimentation programmes and how the tree structure addresses each one.

Optimising a local maximum

A team improves their metric without checking whether the improvement translates to the business outcome. The sign-up page converts better, but revenue does not move because the new users never activate. The tree prevents this by requiring you to trace impact upward from the target node to the North Star, confirming the improvement propagates through every intermediate node.

Cannibalising sibling metrics

An experiment moves users from one path to another, improving one branch of the tree at the expense of a sibling. A more aggressive upsell prompt increases expansion MRR but accelerates churn. Without the tree, these are separate metrics owned by separate teams. With the tree, they are siblings under the same parent, and the trade-off is visible before the change ships.

Testing the wrong metric entirely

Teams test metrics that are easy to move but disconnected from business outcomes. Click-through rates, page views, and time on page are popular primary metrics precisely because they are sensitive to small changes. But if the tree shows no causal path from those metrics to a meaningful business outcome, the test is measuring noise, not signal.

Ignoring delayed effects

Short-duration tests capture immediate effects but miss downstream consequences. A change that lifts activation this week might reduce retention next month. The tree identifies which downstream nodes need monitoring and over what time horizon, so you know when it is truly safe to declare a winner and ship the change.

Weak or missing guardrails

Many teams set guardrails informally or skip them altogether. The tree provides a structural definition of guardrails: the parent node, sibling nodes, and any node on an adjacent branch that shares users with the experiment. This turns guardrail selection from an afterthought into a repeatable process built into every experiment design.

Each of these pitfalls is a symptom of the same underlying problem: experiments designed without a systemic view of the business. The metric tree does not eliminate statistical pitfalls like peeking or multiple comparisons, but it does eliminate the strategic pitfalls that cause teams to run technically sound tests on the wrong questions. A well-powered test on an irrelevant metric is a waste of traffic. A well-designed test on the right node of the tree is how you turn experimentation into compounding business growth.

Building an experimentation culture with metric trees

Running individual A/B tests well is a skill. Building an organisation that experiments systematically is a culture. The metric tree is one of the most effective tools for bridging the gap between the two, because it creates a shared language that connects experimenters to business strategy and makes the value of experimentation visible to leadership.

The first cultural shift the tree enables is prioritisation by impact. When every experiment proposal is mapped to a node in the tree, leadership can evaluate proposals not by how clever the hypothesis is, but by how much the target node matters to the North Star. This depersonalises prioritisation. The debate moves from "whose idea is better?" to "which node has the most leverage?" Teams that previously struggled to get experimentation resources can now make a structural case: this node is underperforming, it sits on the critical path, and our hypothesis addresses a validated bottleneck.

The second shift is accountability without blame. When experiments are tied to tree nodes, a test that fails to move its target metric is not a failure of the team. It is new information about the tree. Perhaps the causal link between the target node and its parent is weaker than assumed. Perhaps the node is already near its ceiling and further improvement requires intervening at a different point in the tree. Failed experiments refine the tree model, which makes future experiments more effective. This reframing is essential for sustaining an experimentation culture, because teams that fear punishment for negative results stop testing ambitious hypotheses.

The third shift is compounding learning. Without a tree, experiment results live in isolated documents and dashboards. Each test teaches the team that ran it something, but the learning does not transfer. With a tree, every experiment result is anchored to a specific node. Over time, you build a rich history at each node: which hypotheses were tested, which moved the metric, which did not, and what the side effects were. New team members can review the experimentation history of their node before proposing tests, avoiding redundant work and building on what has already been learnt.

Finally, the tree makes the cumulative value of experimentation legible to executives. Instead of reporting "we ran 47 experiments this quarter and 18 were winners," you can report "experimentation contributed a 6% improvement to trial-to-paid conversion and a 3% reduction in churn rate, which together drove an estimated 4.2% increase in MRR." The tree provides the structure needed to roll up individual test results into business-level impact. This is what turns experimentation from a product team activity into a company-wide strategic capability.

“A metric tree does not just improve individual experiments. It transforms experimentation from a series of disconnected bets into a systematic programme that compounds learning and impact over time.”

Continue reading

Leading vs lagging indicators

How leading vs lagging indicators connect in a metric tree

Read guide

Metric decomposition

Break any business metric into the components that drive it

Read guide

How to choose KPIs

Stop brainstorming. Start decomposing.

Read guide

Run experiments that move the metrics that matter

A metric tree shows you where to test, what to measure, and how each experiment connects to your North Star. Map your metrics, identify the highest-leverage nodes, and turn experimentation into compounding growth.

Book a demo See pricing

How to run an A/B test with metric trees

Why experiments need metric trees

How a metric tree helps you pick what to test

Designing experiments that target specific tree nodes

Identify the target node

Define guardrail metrics from the tree structure

Set the minimum detectable effect using tree sensitivity

Size the experiment for both primary and guardrail metrics

Document the hypothesis using tree language

Run, measure, and trace the impact through the tree

Measuring experiment impact across the tree

Common A/B testing pitfalls the tree helps avoid

Optimising a local maximum

Cannibalising sibling metrics

Testing the wrong metric entirely

Ignoring delayed effects

Weak or missing guardrails

Building an experimentation culture with metric trees

Continue reading

Leading vs lagging indicators

Metric decomposition

How to choose KPIs

Run experiments that move the metrics that matter

Experience That Matters

Built by a team that's been in your shoes