Reading Cost vs Actual Time in EXPLAIN ANALYZE: A Diagnostic Guide #
When executing EXPLAIN ANALYZE, engineers frequently encounter a stark contrast between the planner’s estimated cost and the measured actual time. Mastering the interpretation of these two metrics is foundational to Reading & Interpreting Query Plans. Cost represents a unitless, optimizer-calculated projection of CPU and I/O effort. Actual time reflects real-world wall-clock execution in milliseconds. Discrepancies between them signal statistical inaccuracies, hardware bottlenecks, or execution model mismatches that require targeted intervention.
Understanding Planner Cost vs. Actual Runtime Metrics #
The optimizer assigns a cost value to each node based on historical table statistics, index selectivity, and configured parameters like seq_page_cost. This cost is arbitrary and relative. It serves strictly as a comparative heuristic rather than a direct time measurement. Actual time is measured in milliseconds and split into startup time and total time. Startup time measures latency until the first row returns. Total time measures the complete execution window. When reading a plan, always multiply the actual time by the loops count to calculate true wall-clock impact. A low-cost node with high loops and elevated actual time typically indicates a nested loop executing an inefficient inner scan.
Decoding Node-Level Discrepancies #
Cost accumulates bottom-up in the execution tree. Actual time is measured independently at each node during runtime. A common diagnostic pattern involves a Hash Join showing a low estimated cost but disproportionately high actual time. This usually stems from underestimated row counts. The miscalculation forces the hash table to spill to disk or exceed work_mem. Parallel execution further complicates the relationship between these metrics. The planner divides estimated cost across workers. Actual time reflects synchronization overhead, worker startup latency, and uneven data distribution. Always compare estimated rows versus actual rows before evaluating time metrics.
Root-Cause Analysis: Why Estimates Diverge #
Divergence between cost and actual time usually traces back to three primary root causes. Stale statistics, parameter sniffing, and predicate complexity drive most misalignments. When the planner relies on outdated pg_statistic data, it miscalculates selectivity. This generates suboptimal join orders and incorrect cost projections. Complex expressions, implicit type casting, or functions in WHERE clauses prevent accurate cardinality estimation. For systematic troubleshooting, cross-reference node-level row estimates with actual execution metrics when Identifying Plan Bottlenecks. High variance, specifically when actual rows exceed estimated rows by a factor of ten, is the primary trigger for cost-to-time misalignment.
Step-by-Step Resolution Workflow #
Follow this diagnostic sequence to isolate and resolve execution bottlenecks:
- Run
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)to capture precise, machine-readable node metrics. - Locate execution nodes where actual row counts deviate significantly from planner estimates.
- Verify table and index statistics freshness using
pg_stat_user_tablesand executeANALYZEif necessary. - Check
work_memallocation for Sort and Hash nodes spilling to disk, indicated by high actual time and low throughput. - Evaluate parallel query settings; if workers are idle or unevenly loaded, adjust
max_parallel_workers_per_gather. - Rewrite predicates to enable index scans or filter pushdown, directly reducing startup latency.
- Re-run
EXPLAIN ANALYZEto validate cost-to-time convergence and confirm performance gains.
Diagnostic SQL Execution Patterns #
Use the following baseline query to capture precise node-level diagnostics. The JSON format enables programmatic parsing of cost and timing deltas.
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT o.order_id, c.customer_name, SUM(i.quantity * i.unit_price)
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items i ON o.order_id = i.order_id
WHERE o.created_at >= '2023-01-01'
GROUP BY o.order_id, c.customer_name;
Targeted statistics refresh corrects planner cost projections. Run ANALYZE VERBOSE on heavily modified tables, then verify distribution alignment.
ANALYZE VERBOSE orders;
ANALYZE VERBOSE order_items;
-- Verify statistics alignment
SELECT attname, n_distinct, most_common_vals
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'created_at';
Common Diagnostic Pitfalls #
- Treating planner cost as milliseconds or direct CPU seconds.
- Ignoring the loops multiplier when evaluating actual time impact.
- Assuming low startup time guarantees optimal total execution time.
- Overlooking parallel worker synchronization overhead in distributed scans.
- Tuning
work_memwithout verifying disk spill indicators inEXPLAINoutput.
Frequently Asked Questions #
Why does EXPLAIN ANALYZE show a low cost but high actual time? Low cost with high actual time typically indicates inaccurate cardinality estimates. The planner chooses an inefficient join strategy or index scan. Stale statistics, missing indexes, or predicates that prevent filter pushdown force the executor to process more rows than anticipated. This inflates wall-clock time despite a low theoretical cost.
Should I optimize based on cost or actual time? Always optimize based on actual time and actual rows. Cost is a planning heuristic used to select an execution path before runtime. Actual time reflects real hardware performance, I/O latency, and memory constraints. Use cost only to understand why the planner made a specific choice, then validate improvements using actual execution metrics.
How do parallel workers affect cost vs actual time readings? The planner divides estimated cost across parallel workers. Actual time includes worker initialization, inter-process communication, and data redistribution overhead. If actual time remains high despite parallel execution, the query may be I/O bound. It may also suffer from lock contention or uneven data distribution that prevents effective parallelism.