Sequential vs Index Scans: Execution Plan Diagnostics & Query Tuning #

When optimizing database performance, understanding the trade-offs between sequential and index access paths is foundational. The query optimizer evaluates table size, data distribution, and I/O costs before selecting an execution strategy. Mastering Execution Plan Fundamentals reveals why a full table scan might actually be optimal in certain scenarios, while an index scan can become a bottleneck when misapplied. This guide breaks down diagnostic workflows, plan node interpretation, and tactical tuning strategies for modern relational databases.

Optimizer Decision Logic & Cost Models #

The database planner calculates estimated I/O and CPU costs to determine scan selection. Sequential scans read pages contiguously, leveraging OS read-ahead and minimizing random I/O overhead. Index scans perform random lookups, which excel at high-selectivity predicates but suffer from cache misses and page fragmentation.

The optimizer weighs sequential versus random page cost parameters alongside row estimates. When queries involve large result sets or lack restrictive WHERE clauses, sequential access often wins. For deeper context on how join algorithms interact with these access paths, review Hash Join Mechanics to understand how scan outputs feed into memory-driven operations.

Cost models rely heavily on random_page_cost and seq_page_cost configuration values. If your storage uses NVMe or SSDs, lowering random_page_cost closer to seq_page_cost encourages the planner to favor index scans more aggressively. Always validate these settings against your actual hardware I/O profile.

Interpreting Plan Nodes & Buffer Metrics #

Execution plans expose scan behavior through distinct node types: Seq Scan, Index Scan, and Index Only Scan. Key diagnostics include actual versus estimated rows, loop counts, and buffer hit/read ratios. A high shared read count during an index scan indicates cold cache or index bloat.

Conversely, a sequential scan with low shared reads suggests efficient prefetching. Always compare startup and total cost estimates against actual execution time. When index scans return excessive rows, the planner may downgrade to sequential access. This behavior directly impacts downstream operations like Merge Join vs Nested Loop, where input row ordering dictates algorithm selection.

Monitor the Buffers: shared hit=... read=... metrics closely. A high read ratio during an index scan signals that the index structure or heap pages are not resident in memory. This often correlates with poor cache hit ratios and elevated disk latency.

Tactical Diagnostics & Index Validation #

Use EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) to capture runtime metrics. Validate index health by checking system catalogs for low scan ratios or high fetch-to-read discrepancies. Run bloat analysis tools to measure leaf page fragmentation.

Implement the following diagnostic workflow when investigating suboptimal scan selection:

If an index exists but isn’t chosen, verify data type alignment, collation mismatches, or implicit casts in predicates. Parameterize queries to prevent plan cache poisoning. For cases where an index exists but the planner still avoids it, consult Why Index Scans Sometimes Underperform to diagnose selectivity thresholds and visibility map limitations.

ORM, Caching & Architecture Tuning #

Application-layer patterns heavily influence scan selection. ORMs frequently generate N+1 queries or missing WHERE clauses that force sequential scans. Mitigate this by enforcing eager loading, adding composite indexes matching sort and filter patterns, and using partial indexes for active record subsets.

Implement query-level caching for high-read, low-write endpoints to bypass storage I/O entirely. Align ORM query builders with database-specific hints only when necessary. Prioritize accurate statistics via regular ANALYZE and VACUUM operations to keep the cost model aligned with reality.

SQL Examples & Execution Plan Breakdown #

Baseline Scan Comparison with Buffer Tracking #

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) 
SELECT id, status FROM orders WHERE created_at > '2024-01-01';

Plan Output Analysis:

Seq Scan on orders (cost=0.00..12450.50 rows=85000 width=12) (actual time=0.012..42.150 rows=84120 loops=1)
 Filter: (created_at > '2024-01-01'::timestamp)
 Rows Removed by Filter: 15880
 Buffers: shared hit=210 read=1890
Planning Time: 0.120 ms
Execution Time: 48.310 ms

The planner selects a Seq Scan because the predicate returns ~84% of the table. The Buffers line shows heavy sequential disk reads (read=1890), confirming contiguous I/O. Forcing an index here would increase random I/O and degrade performance.

Forcing an Index Scan for Diagnostic Isolation #

SET enable_seqscan = OFF;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) 
SELECT id, status FROM orders WHERE created_at > '2024-01-01';
SET enable_seqscan = ON;

Before/After Plan Comparison:

The forced index scan doubles execution time. The Buffers metric reveals massive random page fetches (read=2150), confirming that the optimizer’s cost model correctly identified sequential I/O as superior for this high-selectivity range.

Verifying Index Only Scan Eligibility #

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) 
SELECT id, status FROM orders WHERE status = 'shipped';

Plan Output Analysis:

Index Only Scan using idx_orders_status on orders (cost=0.42..125.30 rows=1200 width=8) (actual time=0.045..1.120 rows=1180 loops=1)
 Index Cond: (status = 'shipped'::text)
 Heap Fetches: 0
 Buffers: shared hit=12
Planning Time: 0.080 ms
Execution Time: 1.250 ms

The query bypasses heap lookups entirely (Heap Fetches: 0). This occurs because the index covers all selected columns and the visibility map is current. If Heap Fetches were high, it would indicate outdated MVCC metadata or missing covering columns.

Common Pitfalls #

Frequently Asked Questions #

When should I intentionally allow a sequential scan instead of forcing an index scan? Sequential scans are optimal when retrieving more than 5-15% of a table’s rows, when the table is small enough to fit in memory, or when the query requires full-table aggregation. Forcing an index scan in these scenarios increases random I/O, CPU overhead for pointer chasing, and overall latency.

How do I diagnose why an existing index is being ignored by the optimizer? Check for implicit data type conversions in predicates, verify that table statistics are current via ANALYZE, and inspect index bloat. Use EXPLAIN with BUFFERS to confirm if the planner estimates the index path as more expensive due to outdated cost parameters or skewed data distribution.

What is the difference between an Index Scan and an Index Only Scan? An Index Scan reads the index to locate row pointers (TIDs) and then fetches the actual data from the heap table. An Index Only Scan retrieves all requested columns directly from the index structure, skipping heap access entirely. The latter requires a covering index and an up-to-date visibility map.

How does table size impact the optimizer’s choice between scan types? Small tables almost always trigger sequential scans because the overhead of index traversal outweighs the benefit. As tables grow, the optimizer shifts toward index scans for selective queries. However, if an index scan must visit many heap pages, the planner may revert to sequential scanning to leverage sequential I/O throughput.