KPI Tree

Metric Definition

Capacity efficiency

Database Utilisation (%) = (Resource Used / Resource Provisioned) x 100
Resource UsedAverage resource consumed over the period (CPU, memory, storage, or connections)
Resource ProvisionedTotal capacity allocated for that resource

Track from

Metric GlossaryOperations Metrics

Database utilisation analysis

Database utilisation analysis measures how much of the compute, memory, storage, and connection capacity you have provisioned is actually being used over a given period. It tells you whether you are paying for headroom you never touch or running so close to the limit that the next traffic spike causes an outage. Tracked well, it is the bridge between infrastructure spend and reliability.

8 min read

Generate AI summary

What is database utilisation analysis?

Database utilisation analysis is the practice of measuring how much of your provisioned database capacity is actually consumed across compute, memory, storage, and connections. If your instance is provisioned with 16 vCPUs and averages 4 vCPUs of load, your CPU utilisation is 25 percent. The same logic applies to memory, disk, and the connection pool. The output is a set of percentages that describe how hard the database is working relative to what you are paying for.

The analysis matters because both extremes are expensive in different ways. A database that sits at 10 percent utilisation is burning budget on idle capacity. A database that sits at 95 percent is one query or one traffic spike away from queueing, timeouts, and a customer-facing incident. The useful range is narrow, and it moves as your workload grows, which is why this is an ongoing analysis rather than a one-off measurement.

Utilisation is also a leading indicator. Storage that climbs two percentage points a week tells you when you will run out of disk long before the alert fires. A connection pool creeping toward its ceiling explains the intermittent errors your application team cannot reproduce. Read this way, utilisation analysis is less about cost reporting and more about seeing the failure before it happens.

Average utilisation hides the danger. A database can average 40 percent CPU and still hit 100 percent every weekday at 09:00. Always analyse peak utilisation alongside the average, because capacity has to cover the peak, not the mean.

How to calculate database utilisation analysis

There is no single utilisation number. A database is a bundle of separate resources that can each saturate independently, so the analysis calculates each one and then reads them together. A server can sit at 20 percent CPU while its connection pool is fully exhausted, and the connection pool is what takes the application down.

  1. 1

    CPU utilisation

    Average and peak processor load divided by provisioned vCPUs. High CPU usually points to inefficient queries, missing indexes, or a workload that has outgrown the instance class.

  2. 2

    Memory utilisation

    Working memory and buffer cache in use divided by allocated memory. When the cache cannot hold the working set, the database reads from disk far more often and latency climbs sharply.

  3. 3

    Storage utilisation

    Disk consumed divided by disk provisioned. Track the growth rate, not just the level, so you can forecast the date you run out rather than reacting when you hit it.

  4. 4

    Connection utilisation

    Active and idle connections divided by the connection limit. Pools fill quietly during peak load and then reject new requests, which surfaces in the application as errors with no obvious database cause.

  5. 5

    IOPS and throughput utilisation

    Read and write operations against the provisioned IOPS ceiling. On cloud databases this is often the real bottleneck because IOPS is metered separately from CPU and storage.

Tie these together by reading them as a set over the same window. The headline question is which resource saturates first, because that resource determines both your reliability ceiling and your next provisioning decision. A database at 30 percent CPU, 85 percent memory, and 95 percent IOPS is an IOPS problem, and adding CPU would waste money while fixing nothing.

Database utilisation analysis in a metric tree

A metric tree decomposes database utilisation into the resources that make it up and then traces each resource back to the workload and the team that drives it. This turns a vague reliability worry into a specific, owned question.

The first level splits utilisation into CPU, memory, storage, connections, and IOPS. Each branch then decomposes into the things that actually move it. CPU breaks down into query efficiency, index coverage, and concurrent request volume. Storage breaks down into data growth, retention policy, and bloat from unvacuumed tables. Connections break down into pool sizing, leaked connections, and per-request hold time.

This structure lets you diagnose precisely rather than over-provisioning by reflex. If utilisation is rising, the tree tells you whether it is more traffic, a slow query that shipped last week, or a retention policy that quietly stopped running. Each diagnosis points at a different fix owned by a different person.

Metric tree insight

The cheapest utilisation win is almost always a query fix, not a bigger instance. A single missing index can drive CPU and IOPS toward saturation at once, so the same root cause shows up on two branches of the tree. Fix the query and both branches fall together.

Database utilisation analysis benchmarks

Healthy utilisation ranges depend on the resource and on how spiky the workload is. The targets below are sustained peak figures, because capacity has to absorb the busy period, not the quiet overnight average. A bursty consumer app needs more headroom than a steady internal reporting database.

ResourceHealthy peak rangeWhat sits outside it
CPU50 to 75 percentBelow 30 percent is over-provisioned and wasteful. Above 85 percent leaves no room for spikes and query latency starts to climb.
Memory60 to 80 percentBelow 40 percent means cache headroom you are paying for and not using. Above 90 percent risks cache eviction and a sudden jump in disk reads.
Storage50 to 70 percentAbove 80 percent gives little time to react to growth. Track the weekly growth rate so you can forecast the exhaustion date.
Connections40 to 70 percentAbove 85 percent at peak means the pool is close to rejecting requests, which surfaces as intermittent application errors.

Treat these as starting points, not absolutes. A database fronted by a connection pooler can run hotter on connections safely. A workload with predictable, gentle traffic can run CPU higher than a spiky one. The benchmark that matters most is your own trend: utilisation that climbs steadily week over week is the signal to act, whatever the absolute level.

How to improve database utilisation analysis

Improving utilisation means pushing each resource toward its healthy band, not simply driving every number down. The goal is enough headroom to survive the peak without paying for capacity that never gets used. The biggest gains usually come from the workload, not the hardware.

Fix the heavy queries

Profile the slowest and most frequent queries, add the missing indexes, and remove full table scans. Query fixes often cut CPU and IOPS at the same time, relieving two branches of the tree with one change.

Right-size the instance

Once the workload is efficient, match the instance class to real peak demand. Drop a database stuck at 15 percent CPU to a smaller tier, and add capacity to one that lives above 85 percent before it becomes an incident.

Manage storage growth

Enforce retention and archival policies so old data does not sit on hot disk forever. Reclaim space from bloated tables and indexes so the storage curve reflects live data, not accumulated waste.

Pool and cap connections

Put a connection pooler in front of the database, set sensible pool sizes, and find the code paths that leak connections. This keeps connection utilisation off the ceiling during peak load.

The metric tree approach starts by finding which resource saturates first, because that is the constraint that caps both reliability and growth. Improving anything else moves a number that was never the problem.

KPI Tree lets you model this by connecting each utilisation branch to the team and the action that influences it. Engineering owns query efficiency and index coverage. Platform owns instance sizing and IOPS provisioning. Data owns retention and growth. With RACI ownership on every node, the accountable owner is named, and when a branch crosses its threshold the alert goes to that person rather than the whole channel. The verified impact loop then checks whether the index that shipped actually pulled CPU back into its healthy band, so you know the fix worked rather than assuming it did.

Common mistakes when tracking database utilisation analysis

  1. 1

    Watching averages and missing peaks

    A database that averages 40 percent CPU can saturate every weekday morning. Capacity has to cover the peak, so analyse peak and average together or you will be blind to the moment that actually breaks.

  2. 2

    Tracking one resource and ignoring the rest

    Teams often watch CPU and forget connections and IOPS. A database can sit comfortably on compute while the connection pool is the thing rejecting requests and taking the application down.

  3. 3

    Reacting to storage level instead of growth rate

    Knowing disk is at 70 percent is far less useful than knowing it climbs two points a week. The growth rate tells you the date you run out, which is the number you can plan against.

  4. 4

    Over-provisioning to feel safe

    Doubling the instance to silence an alert hides the real cause and doubles the bill. Most utilisation spikes are a query or an index problem, and the bigger instance just postpones the same conversation at higher cost.

  5. 5

    Measuring utilisation without owners

    A utilisation dashboard nobody owns gets ignored until it turns red. Without an accountable owner on each resource, the analysis describes the problem but never closes it.

Related metrics

Deployment frequency

DORA metric

Operations Metrics
GitHub

Metric Definition

Deployment Frequency = Number of Production Deployments / Time Period

Deployment frequency measures how often an organisation successfully releases code to production. It is one of the four DORA (DevOps Research and Assessment) metrics that predict software delivery performance and organisational outcomes. Teams that deploy more frequently deliver value to users faster, reduce the risk of each individual release, and create tighter feedback loops between development and production.

View metric

Cycle time

Process speed

Operations Metrics
Jira

Metric Definition

Cycle Time = Process End Time − Process Start Time

Cycle time measures the total elapsed time from the start to the end of a process. It is a fundamental operations metric used in manufacturing, software development, service delivery, and any context where the speed of a process directly affects throughput, cost, and customer satisfaction.

View metric

Free cash flow

FCF

Financial Metrics

Metric Definition

FCF = Operating Cash Flow - Capital Expenditures

Free cash flow (FCF) measures the cash a business generates from operations after accounting for capital expenditures. It represents the actual cash available to pay dividends, repay debt, fund acquisitions, or invest in growth.

View metric

Escalation rate

Customer Support Metrics
Pylon

Metric Definition

Escalation Rate = (Escalated Tickets / Total Tickets Handled) x 100

Escalation rate measures the percentage of support tickets that are transferred from one tier or team to a higher tier or specialist group for resolution. It reflects the gap between the issues customers raise and the ability of frontline agents to resolve them, making it a key indicator of agent readiness, process maturity, and product complexity.

View metric

Why did my metric change?

Metric Definition

When database utilisation drifts away from healthy capacity, this diagnostic framework helps you trace which underlying driver moved it.

View metric

Metric trees for operations teams

Metric Definition

Database utilisation is an operations efficiency measure, so this guide shows how to place it within an operations metric tree alongside related capacity and reliability indicators.

View metric

Turn database utilisation into an owned, diagnosable tree

Build a utilisation metric tree that splits CPU, memory, storage, connections, and IOPS into their drivers, with an accountable owner on every branch and an alert that reaches the right engineer when a resource nears its limit.

Experience That Matters

Built by a team that's been in your shoes

Our team brings deep experience from leading Data, Growth and People teams at some of the fastest growing scaleups in Europe through to IPO and beyond. We've faced the same challenges you're facing now.

Checkout.com
Planet
UK Government
Travelex
BT
Sainsbury's
Goldman Sachs
Dojo
Redpin
Farfetch
Just Eat for Business