Metric Definition
Capacity efficiency
Track from
Database utilisation analysis
Database utilisation analysis measures how much of the compute, memory, storage, and connection capacity you have provisioned is actually being used over a given period. It tells you whether you are paying for headroom you never touch or running so close to the limit that the next traffic spike causes an outage. Tracked well, it is the bridge between infrastructure spend and reliability.
8 min read
What is database utilisation analysis?
Database utilisation analysis is the practice of measuring how much of your provisioned database capacity is actually consumed across compute, memory, storage, and connections. If your instance is provisioned with 16 vCPUs and averages 4 vCPUs of load, your CPU utilisation is 25 percent. The same logic applies to memory, disk, and the connection pool. The output is a set of percentages that describe how hard the database is working relative to what you are paying for.
The analysis matters because both extremes are expensive in different ways. A database that sits at 10 percent utilisation is burning budget on idle capacity. A database that sits at 95 percent is one query or one traffic spike away from queueing, timeouts, and a customer-facing incident. The useful range is narrow, and it moves as your workload grows, which is why this is an ongoing analysis rather than a one-off measurement.
Utilisation is also a leading indicator. Storage that climbs two percentage points a week tells you when you will run out of disk long before the alert fires. A connection pool creeping toward its ceiling explains the intermittent errors your application team cannot reproduce. Read this way, utilisation analysis is less about cost reporting and more about seeing the failure before it happens.
Average utilisation hides the danger. A database can average 40 percent CPU and still hit 100 percent every weekday at 09:00. Always analyse peak utilisation alongside the average, because capacity has to cover the peak, not the mean.
How to calculate database utilisation analysis
There is no single utilisation number. A database is a bundle of separate resources that can each saturate independently, so the analysis calculates each one and then reads them together. A server can sit at 20 percent CPU while its connection pool is fully exhausted, and the connection pool is what takes the application down.
- 1
CPU utilisation
Average and peak processor load divided by provisioned vCPUs. High CPU usually points to inefficient queries, missing indexes, or a workload that has outgrown the instance class.
- 2
Memory utilisation
Working memory and buffer cache in use divided by allocated memory. When the cache cannot hold the working set, the database reads from disk far more often and latency climbs sharply.
- 3
Storage utilisation
Disk consumed divided by disk provisioned. Track the growth rate, not just the level, so you can forecast the date you run out rather than reacting when you hit it.
- 4
Connection utilisation
Active and idle connections divided by the connection limit. Pools fill quietly during peak load and then reject new requests, which surfaces in the application as errors with no obvious database cause.
- 5
IOPS and throughput utilisation
Read and write operations against the provisioned IOPS ceiling. On cloud databases this is often the real bottleneck because IOPS is metered separately from CPU and storage.
Tie these together by reading them as a set over the same window. The headline question is which resource saturates first, because that resource determines both your reliability ceiling and your next provisioning decision. A database at 30 percent CPU, 85 percent memory, and 95 percent IOPS is an IOPS problem, and adding CPU would waste money while fixing nothing.
Database utilisation analysis in a metric tree
A metric tree decomposes database utilisation into the resources that make it up and then traces each resource back to the workload and the team that drives it. This turns a vague reliability worry into a specific, owned question.
The first level splits utilisation into CPU, memory, storage, connections, and IOPS. Each branch then decomposes into the things that actually move it. CPU breaks down into query efficiency, index coverage, and concurrent request volume. Storage breaks down into data growth, retention policy, and bloat from unvacuumed tables. Connections break down into pool sizing, leaked connections, and per-request hold time.
This structure lets you diagnose precisely rather than over-provisioning by reflex. If utilisation is rising, the tree tells you whether it is more traffic, a slow query that shipped last week, or a retention policy that quietly stopped running. Each diagnosis points at a different fix owned by a different person.
Metric tree insight
The cheapest utilisation win is almost always a query fix, not a bigger instance. A single missing index can drive CPU and IOPS toward saturation at once, so the same root cause shows up on two branches of the tree. Fix the query and both branches fall together.
Database utilisation analysis benchmarks
Healthy utilisation ranges depend on the resource and on how spiky the workload is. The targets below are sustained peak figures, because capacity has to absorb the busy period, not the quiet overnight average. A bursty consumer app needs more headroom than a steady internal reporting database.
| Resource | Healthy peak range | What sits outside it |
|---|---|---|
| CPU | 50 to 75 percent | Below 30 percent is over-provisioned and wasteful. Above 85 percent leaves no room for spikes and query latency starts to climb. |
| Memory | 60 to 80 percent | Below 40 percent means cache headroom you are paying for and not using. Above 90 percent risks cache eviction and a sudden jump in disk reads. |
| Storage | 50 to 70 percent | Above 80 percent gives little time to react to growth. Track the weekly growth rate so you can forecast the exhaustion date. |
| Connections | 40 to 70 percent | Above 85 percent at peak means the pool is close to rejecting requests, which surfaces as intermittent application errors. |
Treat these as starting points, not absolutes. A database fronted by a connection pooler can run hotter on connections safely. A workload with predictable, gentle traffic can run CPU higher than a spiky one. The benchmark that matters most is your own trend: utilisation that climbs steadily week over week is the signal to act, whatever the absolute level.
How to improve database utilisation analysis
Improving utilisation means pushing each resource toward its healthy band, not simply driving every number down. The goal is enough headroom to survive the peak without paying for capacity that never gets used. The biggest gains usually come from the workload, not the hardware.
Fix the heavy queries
Profile the slowest and most frequent queries, add the missing indexes, and remove full table scans. Query fixes often cut CPU and IOPS at the same time, relieving two branches of the tree with one change.
Right-size the instance
Once the workload is efficient, match the instance class to real peak demand. Drop a database stuck at 15 percent CPU to a smaller tier, and add capacity to one that lives above 85 percent before it becomes an incident.
Manage storage growth
Enforce retention and archival policies so old data does not sit on hot disk forever. Reclaim space from bloated tables and indexes so the storage curve reflects live data, not accumulated waste.
Pool and cap connections
Put a connection pooler in front of the database, set sensible pool sizes, and find the code paths that leak connections. This keeps connection utilisation off the ceiling during peak load.
The metric tree approach starts by finding which resource saturates first, because that is the constraint that caps both reliability and growth. Improving anything else moves a number that was never the problem.
KPI Tree lets you model this by connecting each utilisation branch to the team and the action that influences it. Engineering owns query efficiency and index coverage. Platform owns instance sizing and IOPS provisioning. Data owns retention and growth. With RACI ownership on every node, the accountable owner is named, and when a branch crosses its threshold the alert goes to that person rather than the whole channel. The verified impact loop then checks whether the index that shipped actually pulled CPU back into its healthy band, so you know the fix worked rather than assuming it did.
Common mistakes when tracking database utilisation analysis
- 1
Watching averages and missing peaks
A database that averages 40 percent CPU can saturate every weekday morning. Capacity has to cover the peak, so analyse peak and average together or you will be blind to the moment that actually breaks.
- 2
Tracking one resource and ignoring the rest
Teams often watch CPU and forget connections and IOPS. A database can sit comfortably on compute while the connection pool is the thing rejecting requests and taking the application down.
- 3
Reacting to storage level instead of growth rate
Knowing disk is at 70 percent is far less useful than knowing it climbs two points a week. The growth rate tells you the date you run out, which is the number you can plan against.
- 4
Over-provisioning to feel safe
Doubling the instance to silence an alert hides the real cause and doubles the bill. Most utilisation spikes are a query or an index problem, and the bigger instance just postpones the same conversation at higher cost.
- 5
Measuring utilisation without owners
A utilisation dashboard nobody owns gets ignored until it turns red. Without an accountable owner on each resource, the analysis describes the problem but never closes it.
Related metrics
Deployment frequency
DORA metric
Operations MetricsMetric Definition
Deployment Frequency = Number of Production Deployments / Time Period
Deployment frequency measures how often an organisation successfully releases code to production. It is one of the four DORA (DevOps Research and Assessment) metrics that predict software delivery performance and organisational outcomes. Teams that deploy more frequently deliver value to users faster, reduce the risk of each individual release, and create tighter feedback loops between development and production.
Cycle time
Process speed
Operations MetricsMetric Definition
Cycle Time = Process End Time − Process Start Time
Cycle time measures the total elapsed time from the start to the end of a process. It is a fundamental operations metric used in manufacturing, software development, service delivery, and any context where the speed of a process directly affects throughput, cost, and customer satisfaction.
Free cash flow
FCF
Financial MetricsMetric Definition
FCF = Operating Cash Flow - Capital Expenditures
Free cash flow (FCF) measures the cash a business generates from operations after accounting for capital expenditures. It represents the actual cash available to pay dividends, repay debt, fund acquisitions, or invest in growth.
Escalation rate
Customer Support MetricsMetric Definition
Escalation Rate = (Escalated Tickets / Total Tickets Handled) x 100
Escalation rate measures the percentage of support tickets that are transferred from one tier or team to a higher tier or specialist group for resolution. It reflects the gap between the issues customers raise and the ability of frontline agents to resolve them, making it a key indicator of agent readiness, process maturity, and product complexity.
Why did my metric change?
Metric Definition
When database utilisation drifts away from healthy capacity, this diagnostic framework helps you trace which underlying driver moved it.
Metric trees for operations teams
Metric Definition
Database utilisation is an operations efficiency measure, so this guide shows how to place it within an operations metric tree alongside related capacity and reliability indicators.
Turn database utilisation into an owned, diagnosable tree
Build a utilisation metric tree that splits CPU, memory, storage, connections, and IOPS into their drivers, with an accountable owner on every branch and an alert that reaches the right engineer when a resource nears its limit.