Missing Indexes on Frequently Queried Columns
Description
Failing to create indexes on columns used in WHERE clauses, JOIN conditions, or ORDER BY statements, causing full table scans that degrade exponentially as data grows. This weakness typically emerges when developers write queries without verifying index existence, often thinking "we'll add it later if needed" or "the table is small enough now." The database must scan entire tables for each query, consuming excessive CPU and I/O resources.
Illustrative Cantor Point
The critical moment occurs when a developer writes a query against a column without verifying an index exists. This seemingly minor oversight - choosing immediate feature delivery over proper index planning - creates a divergent path where performance degrades exponentially with data growth rather than logarithmically.
Real-World Examples / Observed In
- Auth0 (2018): Critical authentication endpoints slowed to a crawl when indexes were accidentally dropped during maintenance, causing cascading failures across customer applications. The missing index on the session lookup table turned O(log n) operations into O(n) table scans. [See: Cases-By-Year/2018 Data Integrity Failures.md#8]
- Generic Payment Service: Missing index on email field in the users table led to deadlocks under high load during login attempts, taking the entire payment system offline as lock contention spiraled out of control.
Common Consequences & Impacts
Technical Impacts
- - O(n) full table scans instead of O(log n) index seeks
- - Increased lock contention and potential deadlocks
- - Memory pressure from loading unnecessary data
- - CPU saturation from scanning operations
Human/Ethical Impacts
- - Users locked out of critical services
- - Payment failures affecting livelihoods
- - Stress on engineering teams during incidents
- - Loss of trust in the platform
Business Impacts
- - Service outages during peak traffic
- - Lost revenue from slow or failed transactions
- - Customer churn due to poor performance
- - Emergency maintenance windows
Recovery Difficulty & Escalation
ADI Principles & Axioms Violated
- Principle of Invisible Decay: Performance erodes silently as data grows until reaching a critical threshold
- Principle of Cascading Catastrophes: A missing index can cascade into system-wide locks and failures
- Principle of Fragile Stability: Systems appear stable with small data but fail catastrophically at scale
Detection / 60-Second Audit
-- PostgreSQL: Find missing indexes on foreign keys
SELECT
tc.table_name,
kcu.column_name,
'CREATE INDEX idx_' || tc.table_name || '_' || kcu.column_name ||
' ON ' || tc.table_name || '(' || kcu.column_name || ');' as suggested_index
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
AND NOT EXISTS (
SELECT 1 FROM pg_indexes
WHERE tablename = tc.table_name
AND indexdef LIKE '%' || kcu.column_name || '%'
);
-- Find slow queries without indexes
SELECT
query,
calls,
mean_exec_time,
rows
FROM pg_stat_statements
WHERE query NOT LIKE '%pg_%'
AND mean_exec_time > 100
ORDER BY mean_exec_time DESC
LIMIT 20;
-- MySQL: Find missing indexes on foreign keys
SELECT
t.TABLE_NAME,
k.COLUMN_NAME,
CONCAT('CREATE INDEX idx_', t.TABLE_NAME, '_', k.COLUMN_NAME,
' ON ', t.TABLE_NAME, '(', k.COLUMN_NAME, ');') as suggested_index
FROM information_schema.KEY_COLUMN_USAGE k
JOIN information_schema.TABLES t
ON k.TABLE_NAME = t.TABLE_NAME
AND k.TABLE_SCHEMA = t.TABLE_SCHEMA
WHERE k.REFERENCED_TABLE_NAME IS NOT NULL
AND k.TABLE_SCHEMA = DATABASE()
AND NOT EXISTS (
SELECT 1 FROM information_schema.STATISTICS s
WHERE s.TABLE_NAME = k.TABLE_NAME
AND s.COLUMN_NAME = k.COLUMN_NAME
AND s.TABLE_SCHEMA = k.TABLE_SCHEMA
);
-- Check slow query log
SELECT
DIGEST_TEXT,
COUNT_STAR as executions,
AVG_TIMER_WAIT/1000000000 as avg_time_ms,
SUM_ROWS_EXAMINED/COUNT_STAR as avg_rows_examined
FROM performance_schema.events_statements_summary_by_digest
WHERE DIGEST_TEXT LIKE '%SELECT%'
AND SUM_ROWS_EXAMINED/COUNT_STAR > 1000
ORDER BY avg_time_ms DESC
LIMIT 20;
-- SQL Server: Find missing indexes from DMV recommendations
SELECT
d.statement as table_name,
d.equality_columns,
d.inequality_columns,
d.included_columns,
s.avg_user_impact,
s.user_seeks + s.user_scans as potential_use_count
FROM sys.dm_db_missing_index_details d
JOIN sys.dm_db_missing_index_stats s ON d.index_handle = s.index_handle
WHERE d.database_id = DB_ID()
ORDER BY s.avg_user_impact * (s.user_seeks + s.user_scans) DESC;
-- Find expensive queries
SELECT TOP 20
qs.execution_count,
qs.total_elapsed_time / qs.execution_count / 1000 as avg_duration_ms,
SUBSTRING(qt.text, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(qt.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2) + 1) as query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE qs.total_elapsed_time / qs.execution_count / 1000 > 100
ORDER BY qs.total_elapsed_time / qs.execution_count DESC;
Prevention & Mitigation Best Practices
- Index First, Optimize Later: Create indexes on any column that appears in WHERE, JOIN, or ORDER BY clauses
- Query Plan Review: Always run EXPLAIN on new queries before deployment
- Automated Detection:
- Use tools like pt-query-digest or pg_stat_statements
- Set up monitoring for sequential scans on large tables
- Composite Index Strategy: For multi-column queries, order matters - most selective column first
- Regular Performance Audits:
- Track query performance in production
- Add indexes based on actual usage patterns
- Remove unused indexes to optimize write performance
Real World Examples
-- Auth0 login degradation: Missing index on email column
SELECT user_id, password_hash, last_login
FROM users
WHERE email = 'user@example.com';
-- Without index: 3-5 second queries on 10M row table
-- With millions of concurrent logins, database CPU hits 100%
-- E-commerce order history: Missing composite index
SELECT order_id, total, status, created_at
FROM orders
WHERE customer_id = 12345
AND created_at >= '2023-01-01'
ORDER BY created_at DESC;
-- Scans entire orders table despite filtering by customer
-- Proper indexing strategy
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, created_at DESC);
-- Verify indexes are used
EXPLAIN (ANALYZE, BUFFERS)
SELECT user_id FROM users WHERE email = 'user@example.com';
-- Should show: Index Scan using idx_users_email
-- Monitor for missing indexes
SELECT
schemaname,
tablename,
attname,
n_distinct,
most_common_vals
FROM pg_stats
WHERE schemaname = 'public'
AND n_distinct > 100
AND tablename || '.' || attname NOT IN (
SELECT tablename || '.' || column_name
FROM pg_indexes
JOIN information_schema.columns USING (tablename)
WHERE schemaname = 'public'
);
AI Coding Guidance/Prompt
Prompt: "When writing SQL queries in this codebase:"
Rules:
- Always verify indexes exist for WHERE clause columns
- Include EXPLAIN ANALYZE output in PR comments for new queries
- Flag any sequential scans on tables >10k rows
Example:
# Bad: Assuming index exists
SELECT * FROM users WHERE email = 'user@example.com';
# Good: Verify first
-- Check: SELECT * FROM pg_indexes WHERE tablename = 'users' AND indexdef LIKE '%email%';
-- If missing: CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'user@example.com';
Relevant Keywords
missing indexes frequently queried columns index performance query optimization Symptoms: slow queries, data inconsistency, constraint violations Preventive: schema validation, constraint enforcement, proper typing Tech stack: PostgreSQL, MySQL, SQL Server, Oracle Industry: all industries, enterprise, SaaS