Metric Definition

Experimentation

Relative Lift = ((Variant Metric Value − Control Metric Value) / Control Metric Value) x 100

Variant Metric ValueThe measured outcome for the group exposed to the new experience (e.g. conversion rate, revenue per user)

Control Metric ValueThe measured outcome for the group exposed to the existing experience

Track from

PostHog

Metric GlossaryProduct Metrics

A/B test performance

A/B test performance is the statistical comparison of how different variants perform against each other on defined success metrics. It captures whether a proposed change (new design, copy, pricing, feature) produces a meaningfully better outcome than the current experience. Rigorous A/B testing replaces opinion-driven decisions with evidence, enabling product teams to invest in changes that demonstrably improve outcomes.

8 min read

Generate AI summary

What is A/B test performance?

A/B test performance measures the difference in outcomes between two or more variants shown to randomly assigned groups of users. The control group sees the existing experience, the treatment group sees the proposed change, and the performance of each is compared on one or more predefined metrics.

The measurement has three core components: the observed difference (lift), the statistical significance of that difference (confidence level), and the practical significance (whether the lift is large enough to matter). A test can show a statistically significant difference that is too small to justify the engineering cost of implementing it, or a large observed difference that is not statistically significant because the sample size was insufficient.

Statistical significance is typically expressed as a p-value or confidence interval. A p-value below 0.05 (95% confidence) is the conventional threshold, meaning there is less than a 5% probability that the observed difference occurred by chance. However, the threshold should be adjusted based on the cost of a wrong decision. High-stakes changes (pricing, core workflows) may warrant 99% confidence, while low-risk changes (copy tweaks, colour variations) may be actionable at 90%.

A/B test performance is not a single metric but a framework for evaluating changes. The specific metrics tracked depend on what the test is trying to improve: conversion rate, revenue per user, engagement, retention, or any other measurable outcome. The discipline is in the methodology: random assignment, adequate sample size, predefined success criteria, and honest interpretation of results.

Always define the primary metric and required sample size before launching a test. Changing the success metric or stopping a test early when results look good introduces bias that invalidates the results. The discipline of pre-registration is what separates rigorous experimentation from cherry-picking data.

Key metrics in A/B test analysis

Metric	What it measures	Why it matters
Relative lift	Percentage improvement of variant over control	Quantifies the size of the effect. A 5% lift in conversion is meaningful; a 0.1% lift may not justify the change.
Statistical significance (p-value)	Probability that the observed difference is due to chance	Determines confidence in the result. Below the predefined threshold means the difference is unlikely to be random.
Confidence interval	Range within which the true effect likely falls	Provides nuance beyond a single number. A lift of 5% with a confidence interval of 2% to 8% is more informative than the point estimate alone.
Statistical power	Probability of detecting a real effect if one exists	Ensures the test is large enough to find meaningful differences. Low power means real improvements may be missed.
Sample ratio mismatch (SRM)	Whether traffic was split evenly as intended	A significant deviation from the expected split indicates a technical problem that invalidates the test results.

Structuring an experimentation programme with a metric tree

A metric tree connects individual test results to the business metrics they are intended to influence, creating a structured view of the experimentation programme's overall impact.

This tree shifts the focus from individual test results to the performance of the experimentation programme as a whole. A mature programme runs many tests, accepts that most will not produce significant results, and measures its value by the cumulative impact of the winners that are shipped.

Connecting test results to downstream business metrics like revenue growth rate or retention rate quantifies the ROI of the experimentation programme. This makes it possible to justify investment in experimentation infrastructure and team capacity based on measured business outcomes.

Common pitfalls in A/B testing

1
Peeking at results and stopping early
Checking results before the test reaches its required sample size and stopping when the result looks good inflates false positive rates. Sequential testing methods exist to allow valid early stopping, but the standard fixed-sample test must run to completion.
2
Testing too many variants without adjusting for multiple comparisons
Running five variants simultaneously without correcting for multiple comparisons dramatically increases the chance of a false positive. If you test five variants at 95% confidence, the probability of at least one false positive is roughly 23%, not 5%. Use Bonferroni correction or similar methods.
3
Ignoring practical significance
A statistically significant result with a tiny effect size may not be worth implementing. Before launching a test, define the minimum detectable effect that would justify the change. If the observed lift is below that threshold, treat it as inconclusive regardless of the p-value.
4
Measuring the wrong metric
A test that improves a proxy metric (clicks, pageviews) while degrading the true goal metric (purchases, retention) is a net negative. Define primary and guardrail metrics before the test. Guardrail metrics ensure that optimising one metric does not come at the expense of others.
5
Insufficient sample size
Running a test on too small a sample produces noisy results that are unreliable. Calculate the required sample size before launching based on baseline conversion rate, minimum detectable effect, and desired statistical power. If the required sample is larger than available traffic, the test is not feasible at that sensitivity.

Tracking A/B test performance with KPI Tree

KPI Tree lets you model your experimentation programme as a metric tree that connects test velocity, win rate, and effect size to cumulative business impact. Each active test can be tracked as a node with its current lift, confidence level, and projected impact, giving leadership visibility into the experimentation pipeline.

Linking test results to the business metrics they target, such as funnel conversion rate, customer satisfaction score, or average order value, creates accountability between the experimentation team and business outcomes. When a winning test is shipped, its measured impact flows into the tree and contributes to the cumulative programme ROI.

The tree also helps with prioritisation. By modelling the potential impact of proposed tests alongside active ones, teams can allocate traffic and engineering resources to the experiments most likely to produce meaningful results.

Related metrics

Conversion rate

CVR

Marketing Metrics

Metric Definition

Conversion Rate = (Number of Conversions / Total Visitors or Leads) × 100

Conversion rate measures the percentage of visitors, users, or leads who take a desired action, such as making a purchase, signing up for a trial, or submitting a form. It is the fundamental metric for evaluating the effectiveness of any acquisition funnel, landing page, or marketing campaign.

View metric

Funnel conversion rate

Growth analytics

Product Metrics

Metric Definition

Funnel Conversion Rate = (Users Completing Final Step / Users Entering First Step) x 100

Funnel conversion rate measures the percentage of users who complete a multi-step process from entry to final outcome. It captures the efficiency of any sequential workflow: onboarding flows, purchase funnels, feature adoption paths, or trial-to-paid journeys. The metric reveals not just how many users convert overall, but where in the sequence users drop off and how large each drop-off is.

View metric

Feature adoption rate

Product Metrics

Metric Definition

Feature Adoption Rate = (Users Who Used the Feature / Total Active Users) × 100

Feature adoption rate measures the percentage of users who use a specific feature within a given period. It tells product teams whether new features are resonating with users and which existing features are underutilised, guiding investment decisions and roadmap priorities.

View metric

Retention rate

Product Metrics

Metric Definition

Retention Rate = (Users Active at End of Period / Users Active at Start of Period) × 100

Retention rate measures the percentage of users or customers who continue to use your product over a given period. It is the most important growth metric because sustainable growth is impossible when users leave faster than they arrive.

View metric

Measure experimentation impact with KPI Tree

Build an experimentation metric tree that connects individual test results to cumulative business impact. Track test velocity, win rates, and the revenue contribution of your experimentation programme.

Book a demo See pricing

A/B test performance

What is A/B test performance?

Key metrics in A/B test analysis

Structuring an experimentation programme with a metric tree

Common pitfalls in A/B testing

Peeking at results and stopping early

Testing too many variants without adjusting for multiple comparisons

Ignoring practical significance

Measuring the wrong metric

Insufficient sample size

Tracking A/B test performance with KPI Tree

Related metrics

Conversion rate

Funnel conversion rate

Feature adoption rate

Retention rate

Measure experimentation impact with KPI Tree

Built by a team that's been in your shoes