KPI Tree
KPI Tree

Metric Lineage vs Causal Lineage

Modern warehouses and BI tools ship rich data lineage. They show how a table was built and which model feeds a chart. None of that tells you which business driver caused the outcome to change. This guide draws the line between the two and explains why it matters when you evaluate a data stack.

11 min read

Generate AI summary

Two kinds of lineage

Definition

Data lineage traces where a number comes from: the source systems, transformations, and models that produced a value. Causal lineage traces what makes a number move: the chain of business drivers, sub-drivers, and inputs that cause a headline outcome to rise or fall. Data lineage is a map of your pipelines. Causal lineage is a map of your business.

Most modern data platforms have made real progress on lineage. You can click a dashboard tile and trace it back through a semantic model, a transformation, and a raw table, all the way to the source system that emitted the event. This is genuinely useful. When a number looks wrong, lineage tells you whether the pipeline is at fault.

But there is a second question that lineage of this kind never answers. When revenue falls, the issue is rarely a broken join. The issue is that something in the business changed. Conversion dropped. Churn rose. A pricing experiment underperformed. Knowing exactly which table fed the revenue figure does nothing to tell you which of those drivers is responsible.

This is the distinction that matters. There is lineage of data, and there is lineage of cause. They look similar on a diagram. They answer completely different questions.

A Chief Data Officer evaluating a modern stack will see the word lineage on every vendor slide and reasonably assume the problem is solved. It is not. The graph that connects one metric to another, by cause rather than by data flow, is a separate layer. Almost nothing in the standard stack ships it.

What data lineage actually traces

Data lineage is a record of provenance. It answers the question: how was this value produced? It follows a value upstream through every transformation until it reaches a source. There are three common flavours, and it helps to name them precisely, because vendors often blur them together.

  1. 1

    Dataset lineage

    Table-to-table provenance. This table was built from those tables. Useful for impact analysis when a source schema changes, and for tracing a data quality issue to its origin.

  2. 2

    Column lineage

    Field-level provenance. This column is derived from that column through a specific expression. Useful for change management and for auditing how a sensitive field propagates.

  3. 3

    Semantic lineage

    Metric-definition provenance. This metric in the semantic layer is computed from these measures and dimensions. Useful for keeping one agreed definition of a number across every tool that reads it.

All three are valuable and all three are about plumbing. They describe how a number is assembled. They are silent on what the number means for the business and on what would make it change. Semantic lineage gets closest, because it lives at the level of metrics, but it still only connects a metric to the data that computes it. It does not connect one metric to another metric that drives it.

The trap

Semantic lineage and causal lineage both produce a graph of metrics, so they are easy to mistake for the same thing. Semantic lineage links a metric to its inputs in the warehouse. Causal lineage links a metric to the other metrics that cause it to move. One is a definition. The other is a theory of the business.

What causal lineage traces

Causal lineage is a record of cause. It answers a different question: why did this outcome move? It places a headline metric at the top and decomposes it into the drivers that cause it, then decomposes those drivers in turn, until you reach inputs a team can act on. This is a metric tree, and every link in it is a claim about cause, not a claim about data flow. For a fuller treatment see what is a metric tree.

Consider monthly recurring revenue. Data lineage tells you the figure was built from a subscriptions table joined to a plans table. Causal lineage tells you that revenue is driven by new customers, expansion, and churn, that new customers are driven by qualified leads and conversion rate, and that conversion rate is driven by demo completion and pricing fit. When revenue falls, only the second view lets you walk down the tree and find the input that moved.

The shape looks like a lineage diagram, but read the edges carefully. None of them describe a transformation. Qualified leads is not computed from monthly recurring revenue. It causes it. That single difference is what separates a map of pipelines from a map of the business, and it is the reason this guide exists. The discipline of building this view is covered in metric decomposition.

The two layers, side by side

Set the two next to each other and the gap becomes obvious. They are built from different inputs, they answer different questions, and the people who need them sit in different seats.

DimensionData lineageCausal lineage
Core questionHow was this number produced?What makes this number move?
Unit of the graphTables, columns, metric definitionsMetrics that drive other metrics
Nature of an edgeA transformation or derivationA causal driver relationship
Built fromPipeline metadata and SQLBusiness judgement and tested correlation
Primary userData engineers and analystsOperators, owners, executives
Answers Why did revenue fallNoYes
Ships in the standard stackYes, widelyRarely, if at all

The last two rows are the ones a buyer should sit with. Data lineage is now table stakes. You should expect it from any serious warehouse or transformation tool. Causal lineage is the layer almost no one ships, and it is the layer that actually answers the question every leadership meeting opens with.

Why the warehouse cannot infer cause

It is fair to ask why the modern stack does not simply extend lineage into causal lineage. The honest answer is that a warehouse cannot infer cause from data flow, because the information is not there to infer.

Pipelines encode derivation, not causation

The warehouse knows table A built table B. It has no record of whether the thing measured in A causes the thing measured in B in the real world. That knowledge lives in the heads of the people who run the business.

Correlation is not enough on its own

You can compute which metrics move together, and that is a strong hint. But two metrics can correlate because a third drives both. Turning correlation into a defensible driver link needs human judgement layered on top of the statistics.

The graph is a model of the business

A causal map reflects a theory of how the company works. It is authored, debated, and revised. No amount of SQL parsing produces it, because it is not in the SQL.

Cause needs an owner to be useful

A driver link only earns its place when someone is accountable for the driver. Pipeline metadata has no concept of accountability, so it can never close the loop from cause to action.

“People change what they do when they can see the system that produces the outcome, not when they are handed another chart of the outcome itself.

This is the behavioural point underneath the technical one. A dashboard shows the outcome. A causal map shows the mechanism. When a team can see the mechanism, and can see which lever they hold, the number stops being a verdict and starts being something they can act on. That shift is the whole game, and it is the reason dashboards vs metric trees is a question worth taking seriously.

What a CDO should test for

If you are assembling or auditing a modern data stack, treat lineage as two requirements, not one. The first is well served by the market. The second is where most stacks quietly leave a hole. These questions separate a tool that traces data from a tool that traces cause.

  1. 1

    Can it show me one metric driving another?

    Not which table feeds a metric, but which metric causes another metric to move. If the only graph on offer is dataset, column, or semantic lineage, the causal layer is absent.

  2. 2

    Can I trace a top-level change to a specific input?

    When the headline number moves, can you walk down the drivers to the one input that caused it, in a few steps, without writing a bespoke analysis each time?

  3. 3

    Does every driver have an accountable owner?

    A causal link with no owner is trivia. Look for RACI on each metric, so Responsible, Accountable, Consulted, and Informed are explicit and the change reaches a named person.

  4. 4

    Does it act when a metric moves, or just display it?

    A dashboard waits to be opened. Ask whether the system pushes the change to the accountable owner when a metric breaches expectation, rather than relying on someone noticing.

  5. 5

    Does it verify the action worked?

    After an owner acts on a driver, does the system check whether the intended metric actually responded? A verified impact loop closes the gap between deciding and knowing.

Buyer note

A stack can score perfectly on data lineage and zero on causal lineage at the same time. Score them separately. The first keeps your pipelines honest. The second is what turns a reporting estate into a system for making decisions. For the wider category framing see decision intelligence.

Where this becomes operational

Drawing the distinction is the easy part. Making causal lineage a living layer that a company runs on is harder, and it is the part KPI Tree is built for.

KPI Tree sits above the warehouse rather than replacing it. Keep your existing data lineage exactly as it is. On top of it, KPI Tree holds the causal layer the stack does not: a metric tree of drivers where each link is a tested causal relationship, RACI ownership on every metric so each driver has an accountable person, a push to that owner the moment a metric moves against expectation, and a verified impact loop that checks the action changed the number it was meant to change.

A tree of causes, not pipelines

Decompose any headline metric into the drivers that move it, with each edge representing a causal link you can trace and test.

An owner on every driver

RACI on each metric means a driver is never an orphan. When it moves, there is always a named person who is accountable for it.

A push when it matters

The accountable owner is told when their metric breaches expectation. The system comes to them rather than waiting to be checked.

A loop that confirms impact

After an action is taken, the verified impact loop checks whether the target metric actually responded, so a decision is closed with evidence.

The result is a clean division of labour. Your warehouse and transformation tools own data lineage and do it well. KPI Tree owns causal lineage and turns it into something a business runs on. Ownership is what makes the second layer hold together over time, a point developed in why metric trees need ownership.

Where this is heading

The two layers are about to matter more, not less, because of how analytics is changing. As natural-language and agentic tools move into the stack, the quality of the answer depends entirely on the quality of the model behind it.

An assistant asked why revenue fell can read data lineage and tell you which tables built the figure. That is not an answer a leader can use. The same assistant, given a causal map with owners, can walk the drivers, find the input that moved, and tell you who is accountable for it. The causal layer is what turns a clever query tool into something that reasons about the business. This is the foundation underneath agentic analytics.

The takeaway

Data lineage answers how a number was made. Causal lineage answers why it moved and who owns the lever. The first is solved across the modern stack. The second is the layer worth building, because it is the one that connects a number to a decision.

Build the layer your warehouse does not ship

Keep your data lineage where it is. Put the causal layer on top with KPI Tree: a metric tree of drivers, an accountable owner on each one, and a verified loop from a number to a decision.

Experience That Matters

Built by a team that's been in your shoes

Our team brings deep experience from leading Data, Growth and People teams at some of the fastest growing scaleups in Europe through to IPO and beyond. We've faced the same challenges you're facing now.

Checkout.com
Planet
UK Government
Travelex
BT
Sainsbury's
Goldman Sachs
Dojo
Redpin
Farfetch
Just Eat for Business