SWE-1003: Cache-Unfriendly Schema Design

Description

Schema designs that prevent effective caching strategies, causing database overload under normal operating conditions. This includes mixing frequently accessed "hot" data with rarely accessed "cold" data in the same tables, poor key design that prevents cache hits, or data structures that require complex joins that can't be cached efficiently.

Illustrative Cantor Point

The Cantor Point occurs when deciding data organization - choosing to store all related data together for "simplicity" versus separating by access patterns. The decision to change a query pattern without considering cache implications, or to add frequently-updated fields to otherwise static tables, creates divergent paths where cache effectiveness drops dramatically.

Categories: CP-Schema, CP-API, CP-Process

Real-World Examples / Observed In

Epic Games Fortnite (2018): API change in v3.5 altered database access patterns, causing cache evictions and I/O thrashing that brought down account services [See: Cases-By-Year/2018 Data Integrity Failures.md#5]
Social Media Platform: User profile views mixed with real-time status updates, invalidating cache on every status change
E-commerce: Product catalog mixed with inventory counts, causing cache misses on popular items

Common Consequences & Impacts

Technical Impacts

- Cache hit rates plummet
- Database CPU and I/O saturation
- Memory pressure from working set size
- Cascading performance degradation

Human/Ethical Impacts

- Users unable to access services
- Game progress lost
- Frustration during peak events
- On-call engineer burnout

Business Impacts

- Service outages during peak usage
- Increased infrastructure costs
- Poor user experience
- Lost revenue during downtime

Recovery Difficulty & Escalation

Recovery Difficulty:

5.5

Escalation Risk:

8

ADI Principles & Axioms Violated

Principle of Invisible Decay: Performance degradation accumulates invisibly
Principle of Emergent Complexity: Small query changes have non-linear impacts

Detection / 60-Second Audit

-- Identify tables mixing static and dynamic data
SELECT 
    t.table_name,
    COUNT(CASE WHEN c.column_name LIKE '%count%' 
               OR c.column_name LIKE '%last_%' 
               OR c.column_name LIKE '%updated%' THEN 1 END) as dynamic_columns,
    COUNT(CASE WHEN c.column_name LIKE '%name%' 
               OR c.column_name LIKE '%description%' 
               OR c.column_name LIKE '%created%' THEN 1 END) as static_columns,
    COUNT(*) as total_columns
FROM information_schema.tables t
JOIN information_schema.columns c ON t.table_name = c.table_name
WHERE t.table_schema = 'public'
GROUP BY t.table_name
HAVING COUNT(CASE WHEN c.column_name LIKE '%count%' OR c.column_name LIKE '%last_%' THEN 1 END) > 0
AND COUNT(CASE WHEN c.column_name LIKE '%name%' OR c.column_name LIKE '%description%' THEN 1 END) > 0;

-- Analyze access patterns
SELECT 
    schemaname,
    tablename,
    n_tup_fetched,
    n_tup_upd + n_tup_ins + n_tup_del as writes,
    CASE 
        WHEN n_tup_fetched > 10 * (n_tup_upd + n_tup_ins + n_tup_del) 
        THEN 'Read-heavy - good cache candidate'
        WHEN (n_tup_upd + n_tup_ins + n_tup_del) > n_tup_fetched 
        THEN 'Write-heavy - poor cache candidate'
        ELSE 'Mixed access pattern'
    END as cache_suitability
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_tup_fetched DESC;

-- Identify tables mixing static and dynamic data
SELECT 
    t.TABLE_NAME,
    COUNT(CASE WHEN c.COLUMN_NAME LIKE '%count%' 
               OR c.COLUMN_NAME LIKE '%last_%' 
               OR c.COLUMN_NAME LIKE '%updated%' THEN 1 END) as dynamic_columns,
    COUNT(CASE WHEN c.COLUMN_NAME LIKE '%name%' 
               OR c.COLUMN_NAME LIKE '%description%' 
               OR c.COLUMN_NAME LIKE '%created%' THEN 1 END) as static_columns,
    COUNT(*) as total_columns
FROM information_schema.TABLES t
JOIN information_schema.COLUMNS c ON t.TABLE_NAME = c.TABLE_NAME
WHERE t.TABLE_SCHEMA = DATABASE()
GROUP BY t.TABLE_NAME
HAVING COUNT(CASE WHEN c.COLUMN_NAME LIKE '%count%' OR c.COLUMN_NAME LIKE '%last_%' THEN 1 END) > 0
AND COUNT(CASE WHEN c.COLUMN_NAME LIKE '%name%' OR c.COLUMN_NAME LIKE '%description%' THEN 1 END) > 0;

-- Check table update frequency (requires performance_schema)
SELECT 
    object_schema,
    object_name,
    count_read,
    count_write,
    CASE 
        WHEN count_read > 10 * count_write THEN 'Read-heavy - cache friendly'
        WHEN count_write > count_read THEN 'Write-heavy - cache hostile'
        ELSE 'Mixed pattern'
    END as cache_assessment
FROM performance_schema.table_io_waits_summary_by_table
WHERE object_schema = DATABASE()
AND count_read + count_write > 1000
ORDER BY count_read DESC;

-- Identify tables mixing static and dynamic data
SELECT 
    t.name AS table_name,
    COUNT(CASE WHEN c.name LIKE '%count%' 
               OR c.name LIKE '%last_%' 
               OR c.name LIKE '%updated%' THEN 1 END) as dynamic_columns,
    COUNT(CASE WHEN c.name LIKE '%name%' 
               OR c.name LIKE '%description%' 
               OR c.name LIKE '%created%' THEN 1 END) as static_columns,
    COUNT(*) as total_columns
FROM sys.tables t
JOIN sys.columns c ON t.object_id = c.object_id
WHERE t.type = 'U'
GROUP BY t.name
HAVING COUNT(CASE WHEN c.name LIKE '%count%' OR c.name LIKE '%last_%' THEN 1 END) > 0
AND COUNT(CASE WHEN c.name LIKE '%name%' OR c.name LIKE '%description%' THEN 1 END) > 0;

-- Check index usage for cache patterns
SELECT 
    OBJECT_NAME(s.object_id) AS table_name,
    i.name AS index_name,
    s.user_seeks + s.user_scans AS reads,
    s.user_updates AS writes,
    CASE 
        WHEN s.user_seeks + s.user_scans > 10 * s.user_updates THEN 'Cache-friendly'
        WHEN s.user_updates > s.user_seeks + s.user_scans THEN 'Cache-hostile'
        ELSE 'Mixed'
    END as pattern
FROM sys.dm_db_index_usage_stats s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE database_id = DB_ID()
ORDER BY reads DESC;

Prevention & Mitigation Best Practices

Separate Hot and Cold Data:

-- Bad: Everything in one table
CREATE TABLE users (
    id BIGINT PRIMARY KEY,
    username VARCHAR(50),
    email VARCHAR(255),
    profile_data JSONB,
    last_login TIMESTAMP,
    login_count INTEGER,
    current_status VARCHAR(20),
    -- Updates invalidate entire cached user
);

-- Good: Separate by access pattern
CREATE TABLE users (  -- Rarely changes, highly cacheable
    id BIGINT PRIMARY KEY,
    username VARCHAR(50),
    email VARCHAR(255),
    profile_data JSONB
);

CREATE TABLE user_activity (  -- Frequently updated
    user_id BIGINT REFERENCES users(id),
    last_login TIMESTAMP,
    login_count INTEGER,
    current_status VARCHAR(20),
    PRIMARY KEY (user_id)
);

Design Cache-Friendly Keys:

-- Enable efficient cache key generation
CREATE TABLE cached_data (
    cache_key VARCHAR(255) PRIMARY KEY,
    data JSONB,
    expires_at TIMESTAMP,
    -- Direct key lookup, no joins needed
);

Implement Read Replicas: Separate read traffic from writes
Use Materialized Views: Pre-compute complex queries
Cache Warming Strategies: Preload critical data
Monitor Cache Hit Rates: Alert on degradation
Test Under Load: Verify cache effectiveness at scale

Real World Examples

-- Cache-hostile: Mixing static product data with dynamic metrics
CREATE TABLE products (
    id BIGINT PRIMARY KEY,
    -- Static data (rarely changes)
    name VARCHAR(255),
    description TEXT,
    category_id INT,
    brand VARCHAR(100),
    specifications JSONB,
    
    -- Dynamic data (changes constantly)
    view_count INT DEFAULT 0,        -- Increments on every view
    last_viewed TIMESTAMP,           -- Updates on every view
    stock_quantity INT,              -- Changes with each order
    price DECIMAL(10,2),            -- May have frequent promotions
    popularity_score FLOAT,          -- Recalculated hourly
    
    created_at TIMESTAMP,
    updated_at TIMESTAMP            -- Makes cache TTL impossible
);

-- Every product view invalidates the entire cached product!
UPDATE products SET 
    view_count = view_count + 1,
    last_viewed = NOW()
WHERE id = ?;

[Example 2]  
-- Poor key design prevents efficient caching
CREATE TABLE user_preferences (
    user_id BIGINT,
    preference_key VARCHAR(100),
    preference_value TEXT,
    updated_at TIMESTAMP,
    PRIMARY KEY (user_id, preference_key)
);

-- Must fetch all preferences to get one
SELECT * FROM user_preferences WHERE user_id = ?;
-- Can't cache individual preferences effectively

-- Separate static from dynamic data
CREATE TABLE products (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    category_id INT NOT NULL,
    brand VARCHAR(100),
    specifications JSONB,
    base_price DECIMAL(10,2) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);
-- Cache for days/weeks

CREATE TABLE product_metrics (
    product_id BIGINT PRIMARY KEY REFERENCES products(id),
    view_count INT DEFAULT 0,
    last_viewed TIMESTAMP,
    popularity_score FLOAT,
    updated_at TIMESTAMP DEFAULT NOW()
);
-- Don't cache or use short TTL

CREATE TABLE product_inventory (
    product_id BIGINT PRIMARY KEY REFERENCES products(id),
    stock_quantity INT NOT NULL DEFAULT 0,
    reserved_quantity INT NOT NULL DEFAULT 0,
    last_updated TIMESTAMP DEFAULT NOW()
);
-- Real-time data, never cache

-- Efficient preference storage
CREATE TABLE user_settings (
    user_id BIGINT PRIMARY KEY,
    theme VARCHAR(20) DEFAULT 'light',
    language VARCHAR(10) DEFAULT 'en',
    timezone VARCHAR(50) DEFAULT 'UTC',
    notifications JSONB DEFAULT '{}'
);
-- Cache entire settings object

-- For dynamic preferences, use key-value with composite key
CREATE TABLE user_cache (
    cache_key VARCHAR(255) PRIMARY KEY,  -- e.g., 'user:123:cart'
    data JSONB NOT NULL,
    expires_at TIMESTAMP NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);
-- Direct key lookup, efficient caching

AI Coding Guidance/Prompt

Prompt: "When designing schemas for high-traffic applications:"
Rules:
  - Flag tables mixing frequently-updated and static columns
  - Warn about wide tables (>20 columns) in cache paths
  - Suggest separating read-heavy from write-heavy data
  - Require cache key strategy for high-volume queries
  - Monitor working set size vs available memory
  
Example:
  # Bad: Cache-hostile design
  CREATE TABLE products (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255),
    description TEXT,
    price DECIMAL(10,2),
    stock_quantity INTEGER,      -- Changes constantly
    view_count INTEGER,          -- Updates on every view
    last_viewed TIMESTAMP,       -- Invalidates cache frequently
    category_id INTEGER,
    specifications JSONB
  );
  
  # Good: Cache-optimized design
  -- Static product data (cache forever)
  CREATE TABLE products (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255),
    description TEXT,
    price DECIMAL(10,2),
    category_id INTEGER,
    specifications JSONB
  );
  
  -- Dynamic inventory (don't cache)
  CREATE TABLE product_inventory (
    product_id BIGINT PRIMARY KEY REFERENCES products(id),
    stock_quantity INTEGER,
    reserved_quantity INTEGER,
    last_updated TIMESTAMP DEFAULT NOW()
  );
  
  -- Analytics data (cache with TTL)
  CREATE TABLE product_analytics (
    product_id BIGINT REFERENCES products(id),
    period_start DATE,
    view_count INTEGER DEFAULT 0,
    PRIMARY KEY (product_id, period_start)
  );

Relevant Keywords

cache unfriendly schema design Symptoms: slow queries, data inconsistency, constraint violations Preventive: schema validation, constraint enforcement, proper typing Tech stack: PostgreSQL, MySQL, SQL Server, Oracle Industry: all industries, enterprise, SaaS