How PostgreSQL Calculates Node Costs: A Diagnostic Guide to EXPLAIN Cost Metrics #
PostgreSQL’s query planner relies on a deterministic, unitless cost model to evaluate and rank execution paths. When you run EXPLAIN, the startup_cost and total_cost values are not arbitrary. They are calculated using fixed arithmetic formulas that weigh sequential and random I/O against CPU processing overhead. Understanding this calculation is essential for diagnosing why the optimizer selects a suboptimal join strategy or scan type. For engineers and DBAs, mastering the underlying math transforms EXPLAIN output from a black box into a precise diagnostic instrument. This guide breaks down the exact cost equations, constant interactions, and step-by-step resolution paths for cost-driven performance bottlenecks.
The Core Cost Equation: I/O vs. CPU Components #
Every node’s cost derives from a linear combination of disk access and row processing. The foundational formula is cost = (pages_accessed × page_cost) + (rows_processed × cpu_cost). The planner aggregates these values bottom-up, starting from leaf scans and propagating through joins. The result is a dimensionless metric representing estimated resource consumption. When diagnosing plan regressions, isolate whether the miscalculation stems from inaccurate page estimates or misaligned cost constants. A foundational grasp of Execution Plan Fundamentals is required before adjusting planner behavior.
Page-Level Costs: seq_page_cost and random_page_cost #
Sequential and random page costs dictate the I/O penalty. By default, seq_page_cost is 1.0 and random_page_cost is 4.0. This 4:1 ratio assumes traditional spinning disks where random seeks are significantly slower than sequential reads. On NVMe or SSD storage, this ratio artificially inflates index scan costs. The planner defaults to sequential scans even when indexes are optimal. Measure actual I/O latency and adjust random_page_cost downward, typically to 1.1–1.5. The Cost Estimation Models documentation details how these constants interact with buffer pool behavior and cache hit probabilities.
Tuple-Level Costs: cpu_tuple_cost and cpu_index_tuple_cost #
CPU costs account for row filtering, projection, and index traversal. cpu_tuple_cost (default 0.01) applies to heap tuple processing. cpu_index_tuple_cost (default 0.005) covers index entry evaluation. These scale linearly with estimated row counts. If EXPLAIN shows unexpectedly high total costs on filtered queries, verify that WHERE clause selectivity aligns with pg_statistic. High CPU costs often indicate missing partial indexes or inefficient filter pushdown. They rarely indicate hardware bottlenecks.
Startup vs. Total Cost: What the Planner Actually Optimizes #
startup_cost represents the work required before the first row is emitted. Examples include hash table builds or sort initialization. total_cost is the cumulative work to return all estimated rows. The planner minimizes total_cost by default. It switches to startup_cost optimization only when LIMIT or cursor-based fetching is detected. Misinterpreting startup_cost as execution latency is a common diagnostic error. High startup costs on hash joins often indicate insufficient work_mem. This forces disk spills that the planner cannot fully anticipate.
Tree Aggregation and Join Cost Propagation #
Costs propagate upward through the execution tree. For nested loops, the planner multiplies outer_rows by inner_startup_cost, then adds outer_total_cost. Hash joins sum the build phase and the probe phase. The probe phase multiplies inner scan cost by outer_rows and cpu_operator_cost. Merge joins add sort costs for both inputs before merging. When diagnosing join misestimations, trace the cost accumulation step-by-step. Verify row estimates at each leaf. Manually apply the join formula to identify which node’s cost inflation triggered the suboptimal path selection.
Step-by-Step Resolution for Cost Misestimation #
Follow this precise diagnostic workflow to isolate and resolve cost-driven misestimations.
- Baseline Current Constants: Query active planner settings to establish your cost model.
SELECT name, setting, unit
FROM pg_settings
WHERE name LIKE '%cost%' OR name = 'effective_cache_size'
ORDER BY name;
Retrieves all active planner cost constants and cache size parameters. Use this to baseline the current cost model before tuning.
- Extract Node-Level Metrics: Run structured
EXPLAINto isolate startup and total costs per node.
EXPLAIN (COSTS ON, FORMAT JSON)
SELECT * FROM orders
WHERE status = 'pending'
AND created_at > NOW() - INTERVAL '30 days';
Returns structured cost metrics per node. Parse startup_cost and total_cost to isolate which node contributes most to the total plan cost.
- Validate Cost Alignment: Temporarily adjust I/O penalties to force index evaluation.
SET LOCAL random_page_cost = 1.2;
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
Temporarily lowers random I/O penalty to force index scan evaluation. If the plan switches and performance improves, the default cost constant is misaligned with your storage tier.
-
Calculate Expected Cost: Run
EXPLAIN (ANALYZE, BUFFERS)to compare estimated vs. actual rows and buffer hits. Manually compute(actual_pages × page_cost) + (actual_rows × cpu_cost). -
Apply Corrections: If calculated costs diverge from
EXPLAIN, update table statistics withANALYZE. Adjust correlation thresholds if sequential scans are over-penalized. Persist validated constant changes viapostgresql.confor role-level settings. -
Verify Execution: Re-run the query and verify the planner selects the intended node type. Confirm actual runtime aligns with the revised cost projections.
Common Diagnostic Pitfalls #
- Treating cost values as milliseconds or CPU cycles. They are strictly unitless relative metrics.
- Ignoring
pg_statisticcorrelation values, which drastically alter index scan cost projections for sorted data. - Permanently altering cost constants without workload profiling, causing regressions on mixed OLTP/OLAP systems.
- Assuming high
total_costequals slow execution. Actual runtime depends on cache state, parallelism, and I/O concurrency. - Overlooking
cpu_operator_costwhen queries contain complex expressions, window functions, or custom operators.
Frequently Asked Questions #
Are PostgreSQL EXPLAIN costs measured in milliseconds? No. Costs are unitless, relative estimates representing a weighted combination of I/O operations and CPU cycles. They are designed for internal planner comparison, not absolute runtime prediction.
How do I safely adjust cost constants without breaking other queries?
Apply changes at the session level using SET LOCAL for testing. Validate across representative query patterns. Only persist changes to postgresql.conf if they improve overall workload performance. Consider using per-role or per-database settings to isolate tuning.
Why does EXPLAIN show a high cost but the query executes quickly?
High estimated costs often result from outdated statistics, misconfigured cost constants, or unaccounted cache hits. The planner assumes worst-case I/O, while actual execution benefits from shared_buffers, OS page cache, or parallel execution not fully modeled in the cost formula.
How does effective_cache_size influence node cost calculation?
It does not appear directly in the cost formula. It influences the planner’s probability model for cache hits. A higher effective_cache_size reduces the expected penalty for random_page_cost, making index scans comparatively cheaper in the optimizer’s decision matrix.