Metric Lineage vs Causal Lineage
Modern warehouses and BI tools ship rich data lineage. They show how a table was built and which model feeds a chart. None of that tells you which business driver caused the outcome to change. This guide draws the line between the two and explains why it matters when you evaluate a data stack.
11 min read
Two kinds of lineage
Definition
Data lineage traces where a number comes from: the source systems, transformations, and models that produced a value. Causal lineage traces what makes a number move: the chain of business drivers, sub-drivers, and inputs that cause a headline outcome to rise or fall. Data lineage is a map of your pipelines. Causal lineage is a map of your business.
Most modern data platforms have made real progress on lineage. You can click a dashboard tile and trace it back through a semantic model, a transformation, and a raw table, all the way to the source system that emitted the event. This is genuinely useful. When a number looks wrong, lineage tells you whether the pipeline is at fault.
But there is a second question that lineage of this kind never answers. When revenue falls, the issue is rarely a broken join. The issue is that something in the business changed. Conversion dropped. Churn rose. A pricing experiment underperformed. Knowing exactly which table fed the revenue figure does nothing to tell you which of those drivers is responsible.
This is the distinction that matters. There is lineage of data, and there is lineage of cause. They look similar on a diagram. They answer completely different questions.
A Chief Data Officer evaluating a modern stack will see the word lineage on every vendor slide and reasonably assume the problem is solved. It is not. The graph that connects one metric to another, by cause rather than by data flow, is a separate layer. Almost nothing in the standard stack ships it.
What data lineage actually traces
Data lineage is a record of provenance. It answers the question: how was this value produced? It follows a value upstream through every transformation until it reaches a source. There are three common flavours, and it helps to name them precisely, because vendors often blur them together.
- 1
Dataset lineage
Table-to-table provenance. This table was built from those tables. Useful for impact analysis when a source schema changes, and for tracing a data quality issue to its origin.
- 2
Column lineage
Field-level provenance. This column is derived from that column through a specific expression. Useful for change management and for auditing how a sensitive field propagates.
- 3
Semantic lineage
Metric-definition provenance. This metric in the semantic layer is computed from these measures and dimensions. Useful for keeping one agreed definition of a number across every tool that reads it.
All three are valuable and all three are about plumbing. They describe how a number is assembled. They are silent on what the number means for the business and on what would make it change. Semantic lineage gets closest, because it lives at the level of metrics, but it still only connects a metric to the data that computes it. It does not connect one metric to another metric that drives it.
The trap
Semantic lineage and causal lineage both produce a graph of metrics, so they are easy to mistake for the same thing. Semantic lineage links a metric to its inputs in the warehouse. Causal lineage links a metric to the other metrics that cause it to move. One is a definition. The other is a theory of the business.
What causal lineage traces
Causal lineage is a record of cause. It answers a different question: why did this outcome move? It places a headline metric at the top and decomposes it into the drivers that cause it, then decomposes those drivers in turn, until you reach inputs a team can act on. This is a metric tree, and every link in it is a claim about cause, not a claim about data flow. For a fuller treatment see what is a metric tree.
Consider monthly recurring revenue. Data lineage tells you the figure was built from a subscriptions table joined to a plans table. Causal lineage tells you that revenue is driven by new customers, expansion, and churn, that new customers are driven by qualified leads and conversion rate, and that conversion rate is driven by demo completion and pricing fit. When revenue falls, only the second view lets you walk down the tree and find the input that moved.
The shape looks like a lineage diagram, but read the edges carefully. None of them describe a transformation. Qualified leads is not computed from monthly recurring revenue. It causes it. That single difference is what separates a map of pipelines from a map of the business, and it is the reason this guide exists. The discipline of building this view is covered in metric decomposition.
The two layers, side by side
Set the two next to each other and the gap becomes obvious. They are built from different inputs, they answer different questions, and the people who need them sit in different seats.
| Dimension | Data lineage | Causal lineage |
|---|---|---|
| Core question | How was this number produced? | What makes this number move? |
| Unit of the graph | Tables, columns, metric definitions | Metrics that drive other metrics |
| Nature of an edge | A transformation or derivation | A causal driver relationship |
| Built from | Pipeline metadata and SQL | Business judgement and tested correlation |
| Primary user | Data engineers and analysts | Operators, owners, executives |
| Answers Why did revenue fall | No | Yes |
| Ships in the standard stack | Yes, widely | Rarely, if at all |
The last two rows are the ones a buyer should sit with. Data lineage is now table stakes. You should expect it from any serious warehouse or transformation tool. Causal lineage is the layer almost no one ships, and it is the layer that actually answers the question every leadership meeting opens with.
Why the warehouse cannot infer cause
It is fair to ask why the modern stack does not simply extend lineage into causal lineage. The honest answer is that a warehouse cannot infer cause from data flow, because the information is not there to infer.
Pipelines encode derivation, not causation
The warehouse knows table A built table B. It has no record of whether the thing measured in A causes the thing measured in B in the real world. That knowledge lives in the heads of the people who run the business.
Correlation is not enough on its own
You can compute which metrics move together, and that is a strong hint. But two metrics can correlate because a third drives both. Turning correlation into a defensible driver link needs human judgement layered on top of the statistics.
The graph is a model of the business
A causal map reflects a theory of how the company works. It is authored, debated, and revised. No amount of SQL parsing produces it, because it is not in the SQL.
Cause needs an owner to be useful
A driver link only earns its place when someone is accountable for the driver. Pipeline metadata has no concept of accountability, so it can never close the loop from cause to action.
“People change what they do when they can see the system that produces the outcome, not when they are handed another chart of the outcome itself.”
This is the behavioural point underneath the technical one. A dashboard shows the outcome. A causal map shows the mechanism. When a team can see the mechanism, and can see which lever they hold, the number stops being a verdict and starts being something they can act on. That shift is the whole game, and it is the reason dashboards vs metric trees is a question worth taking seriously.
What a CDO should test for
If you are assembling or auditing a modern data stack, treat lineage as two requirements, not one. The first is well served by the market. The second is where most stacks quietly leave a hole. These questions separate a tool that traces data from a tool that traces cause.
- 1
Can it show me one metric driving another?
Not which table feeds a metric, but which metric causes another metric to move. If the only graph on offer is dataset, column, or semantic lineage, the causal layer is absent.
- 2
Can I trace a top-level change to a specific input?
When the headline number moves, can you walk down the drivers to the one input that caused it, in a few steps, without writing a bespoke analysis each time?
- 3
Does every driver have an accountable owner?
A causal link with no owner is trivia. Look for RACI on each metric, so Responsible, Accountable, Consulted, and Informed are explicit and the change reaches a named person.
- 4
Does it act when a metric moves, or just display it?
A dashboard waits to be opened. Ask whether the system pushes the change to the accountable owner when a metric breaches expectation, rather than relying on someone noticing.
- 5
Does it verify the action worked?
After an owner acts on a driver, does the system check whether the intended metric actually responded? A verified impact loop closes the gap between deciding and knowing.
Buyer note
A stack can score perfectly on data lineage and zero on causal lineage at the same time. Score them separately. The first keeps your pipelines honest. The second is what turns a reporting estate into a system for making decisions. For the wider category framing see decision intelligence.
Where this becomes operational
Drawing the distinction is the easy part. Making causal lineage a living layer that a company runs on is harder, and it is the part KPI Tree is built for.
KPI Tree sits above the warehouse rather than replacing it. Keep your existing data lineage exactly as it is. On top of it, KPI Tree holds the causal layer the stack does not: a metric tree of drivers where each link is a tested causal relationship, RACI ownership on every metric so each driver has an accountable person, a push to that owner the moment a metric moves against expectation, and a verified impact loop that checks the action changed the number it was meant to change.
A tree of causes, not pipelines
Decompose any headline metric into the drivers that move it, with each edge representing a causal link you can trace and test.
An owner on every driver
RACI on each metric means a driver is never an orphan. When it moves, there is always a named person who is accountable for it.
A push when it matters
The accountable owner is told when their metric breaches expectation. The system comes to them rather than waiting to be checked.
A loop that confirms impact
After an action is taken, the verified impact loop checks whether the target metric actually responded, so a decision is closed with evidence.
The result is a clean division of labour. Your warehouse and transformation tools own data lineage and do it well. KPI Tree owns causal lineage and turns it into something a business runs on. Ownership is what makes the second layer hold together over time, a point developed in why metric trees need ownership.
Where this is heading
The two layers are about to matter more, not less, because of how analytics is changing. As natural-language and agentic tools move into the stack, the quality of the answer depends entirely on the quality of the model behind it.
An assistant asked why revenue fell can read data lineage and tell you which tables built the figure. That is not an answer a leader can use. The same assistant, given a causal map with owners, can walk the drivers, find the input that moved, and tell you who is accountable for it. The causal layer is what turns a clever query tool into something that reasons about the business. This is the foundation underneath agentic analytics.
The takeaway
Data lineage answers how a number was made. Causal lineage answers why it moved and who owns the lever. The first is solved across the modern stack. The second is the layer worth building, because it is the one that connects a number to a decision.
Continue reading
Dashboards vs Metric Trees
What dashboards miss and metric trees solve.
Why Did My Metric Change?
Stop guessing. Start tracing.
What Is a Metric Tree?
A metric tree maps cause and effect so every team sees what moves the needle
Decision Intelligence
The problem was never a lack of data. It was a lack of structure around decisions.
Build the layer your warehouse does not ship
Keep your data lineage where it is. Put the causal layer on top with KPI Tree: a metric tree of drivers, an accountable owner on each one, and a verified loop from a number to a decision.