SWE-0001: Missing Indexes on Frequently Queried Columns

Description

Failing to create indexes on columns used in WHERE clauses, JOIN conditions, or ORDER BY statements, causing full table scans that degrade exponentially as data grows. This weakness typically emerges when developers write queries without verifying index existence, often thinking "we'll add it later if needed" or "the table is small enough now." The database must scan entire tables for each query, consuming excessive CPU and I/O resources.

Illustrative Cantor Point

The critical moment occurs when a developer writes a query against a column without verifying an index exists. This seemingly minor oversight - choosing immediate feature delivery over proper index planning - creates a divergent path where performance degrades exponentially with data growth rather than logarithmically.

Categories: CP-Schema, CP-API, CP-Process

Real-World Examples / Observed In

Auth0 (2018): Critical authentication endpoints slowed to a crawl when indexes were accidentally dropped during maintenance, causing cascading failures across customer applications. The missing index on the session lookup table turned O(log n) operations into O(n) table scans. [See: Cases-By-Year/2018 Data Integrity Failures.md#8]
Generic Payment Service: Missing index on email field in the users table led to deadlocks under high load during login attempts, taking the entire payment system offline as lock contention spiraled out of control.

Common Consequences & Impacts

Technical Impacts

- O(n) full table scans instead of O(log n) index seeks
- Increased lock contention and potential deadlocks
- Memory pressure from loading unnecessary data
- CPU saturation from scanning operations

Human/Ethical Impacts

- Users locked out of critical services
- Payment failures affecting livelihoods
- Stress on engineering teams during incidents
- Loss of trust in the platform

Business Impacts

- Service outages during peak traffic
- Lost revenue from slow or failed transactions
- Customer churn due to poor performance
- Emergency maintenance windows

Recovery Difficulty & Escalation

Recovery Difficulty:

3.5

Escalation Risk:

7.5

ADI Principles & Axioms Violated

Principle of Invisible Decay: Performance erodes silently as data grows until reaching a critical threshold
Principle of Cascading Catastrophes: A missing index can cascade into system-wide locks and failures
Principle of Fragile Stability: Systems appear stable with small data but fail catastrophically at scale

Detection / 60-Second Audit

-- PostgreSQL: Find missing indexes on foreign keys
SELECT 
    tc.table_name, 
    kcu.column_name,
    'CREATE INDEX idx_' || tc.table_name || '_' || kcu.column_name || 
    ' ON ' || tc.table_name || '(' || kcu.column_name || ');' as suggested_index
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu 
    ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
AND NOT EXISTS (
    SELECT 1 FROM pg_indexes 
    WHERE tablename = tc.table_name 
    AND indexdef LIKE '%' || kcu.column_name || '%'
);

-- Find slow queries without indexes
SELECT 
    query,
    calls,
    mean_exec_time,
    rows
FROM pg_stat_statements
WHERE query NOT LIKE '%pg_%'
AND mean_exec_time > 100
ORDER BY mean_exec_time DESC
LIMIT 20;

-- MySQL: Find missing indexes on foreign keys
SELECT 
    t.TABLE_NAME,
    k.COLUMN_NAME,
    CONCAT('CREATE INDEX idx_', t.TABLE_NAME, '_', k.COLUMN_NAME, 
           ' ON ', t.TABLE_NAME, '(', k.COLUMN_NAME, ');') as suggested_index
FROM information_schema.KEY_COLUMN_USAGE k
JOIN information_schema.TABLES t 
    ON k.TABLE_NAME = t.TABLE_NAME 
    AND k.TABLE_SCHEMA = t.TABLE_SCHEMA
WHERE k.REFERENCED_TABLE_NAME IS NOT NULL
AND k.TABLE_SCHEMA = DATABASE()
AND NOT EXISTS (
    SELECT 1 FROM information_schema.STATISTICS s
    WHERE s.TABLE_NAME = k.TABLE_NAME
    AND s.COLUMN_NAME = k.COLUMN_NAME
    AND s.TABLE_SCHEMA = k.TABLE_SCHEMA
);

-- Check slow query log
SELECT 
    DIGEST_TEXT,
    COUNT_STAR as executions,
    AVG_TIMER_WAIT/1000000000 as avg_time_ms,
    SUM_ROWS_EXAMINED/COUNT_STAR as avg_rows_examined
FROM performance_schema.events_statements_summary_by_digest
WHERE DIGEST_TEXT LIKE '%SELECT%'
AND SUM_ROWS_EXAMINED/COUNT_STAR > 1000
ORDER BY avg_time_ms DESC
LIMIT 20;

-- SQL Server: Find missing indexes from DMV recommendations
SELECT 
    d.statement as table_name,
    d.equality_columns,
    d.inequality_columns,
    d.included_columns,
    s.avg_user_impact,
    s.user_seeks + s.user_scans as potential_use_count
FROM sys.dm_db_missing_index_details d
JOIN sys.dm_db_missing_index_stats s ON d.index_handle = s.index_handle
WHERE d.database_id = DB_ID()
ORDER BY s.avg_user_impact * (s.user_seeks + s.user_scans) DESC;

-- Find expensive queries
SELECT TOP 20
    qs.execution_count,
    qs.total_elapsed_time / qs.execution_count / 1000 as avg_duration_ms,
    SUBSTRING(qt.text, (qs.statement_start_offset/2)+1,
        ((CASE qs.statement_end_offset
            WHEN -1 THEN DATALENGTH(qt.text)
            ELSE qs.statement_end_offset
        END - qs.statement_start_offset)/2) + 1) as query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE qs.total_elapsed_time / qs.execution_count / 1000 > 100
ORDER BY qs.total_elapsed_time / qs.execution_count DESC;

Prevention & Mitigation Best Practices

Index First, Optimize Later: Create indexes on any column that appears in WHERE, JOIN, or ORDER BY clauses
Query Plan Review: Always run EXPLAIN on new queries before deployment
Automated Detection:
- Use tools like pt-query-digest or pg_stat_statements
- Set up monitoring for sequential scans on large tables
Composite Index Strategy: For multi-column queries, order matters - most selective column first
Regular Performance Audits:
- Track query performance in production
- Add indexes based on actual usage patterns
- Remove unused indexes to optimize write performance

Real World Examples

-- Auth0 login degradation: Missing index on email column
SELECT user_id, password_hash, last_login 
FROM users 
WHERE email = 'user@example.com';
-- Without index: 3-5 second queries on 10M row table
-- With millions of concurrent logins, database CPU hits 100%

-- E-commerce order history: Missing composite index
SELECT order_id, total, status, created_at
FROM orders
WHERE customer_id = 12345
AND created_at >= '2023-01-01'
ORDER BY created_at DESC;
-- Scans entire orders table despite filtering by customer

-- Proper indexing strategy
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, created_at DESC);

-- Verify indexes are used
EXPLAIN (ANALYZE, BUFFERS) 
SELECT user_id FROM users WHERE email = 'user@example.com';
-- Should show: Index Scan using idx_users_email

-- Monitor for missing indexes
SELECT 
    schemaname,
    tablename,
    attname,
    n_distinct,
    most_common_vals
FROM pg_stats 
WHERE schemaname = 'public' 
AND n_distinct > 100
AND tablename || '.' || attname NOT IN (
    SELECT tablename || '.' || column_name 
    FROM pg_indexes 
    JOIN information_schema.columns USING (tablename)
    WHERE schemaname = 'public'
);

AI Coding Guidance/Prompt

Prompt: "When writing SQL queries in this codebase:"
Rules:
  - Always verify indexes exist for WHERE clause columns
  - Include EXPLAIN ANALYZE output in PR comments for new queries
  - Flag any sequential scans on tables >10k rows
  
Example:
  # Bad: Assuming index exists
  SELECT * FROM users WHERE email = 'user@example.com';
  
  # Good: Verify first
  -- Check: SELECT * FROM pg_indexes WHERE tablename = 'users' AND indexdef LIKE '%email%';
  -- If missing: CREATE INDEX idx_users_email ON users(email);
  SELECT * FROM users WHERE email = 'user@example.com';

Relevant Keywords

missing indexes frequently queried columns index performance query optimization Symptoms: slow queries, data inconsistency, constraint violations Preventive: schema validation, constraint enforcement, proper typing Tech stack: PostgreSQL, MySQL, SQL Server, Oracle Industry: all industries, enterprise, SaaS