SWE-4001

Poor Data Partitioning and Hotspot Keys

Cantorian Technical Debt Magnitude: ℵ₁ (Systemic)

Description

Designing schemas without considering data distribution, either keeping everything monolithic or choosing partition keys that create severe hotspots. Includes using low-cardinality keys, time-based keys that funnel current activity to one shard, or natural keys with inherent skew (like popular users or products).

Illustrative Cantor Point

The Cantor Point occurs when choosing the sharding strategy - or deciding not to shard at all. Selecting a partition key based on convenience rather than load distribution patterns creates a system where horizontal scaling provides no benefit.

Categories: CP-Schema, CP-API, CP-Process

Real-World Examples / Observed In

  • Foursquare (2010): MongoDB shard key led to uneven data growth, one shard held far more data and crashed
  • Instagram: Early sharding by user_id created hotspots for celebrity accounts
  • Time-series Systems: Sharding by current date creates write hotspots on "today's" shard

Common Consequences & Impacts

Technical Impacts

  • - One shard/partition becomes bottleneck
  • - Inability to scale horizontally
  • - Uneven resource utilization
  • - Complex rebalancing requirements

Human/Ethical Impacts

  • - Service unavailable during peak times
  • - Unfair performance for some users
  • - Engineering stress from firefighting
  • - Innovation blocked by scaling issues

Business Impacts

  • - Performance degradation under load
  • - Wasted infrastructure investment
  • - Service outages from hotspot failures
  • - Scaling limitations

Recovery Difficulty & Escalation

7.5
5.5

ADI Principles & Axioms Violated

  • Principle of Emergent Transformation: Distribution patterns change with scale
  • Principle of Invisible Decay: Hotspots develop gradually then suddenly fail
  • Principle of Cascading Catastrophes: One hot shard can bring down the cluster

Detection / 60-Second Audit

-- Detect uneven distribution (example for user-based sharding)
SELECT 
    SUBSTRING(user_id::text, 1, 2) as shard_prefix,
    COUNT(*) as record_count,
    COUNT(*)::float / (SELECT COUNT(*) FROM users) * 100 as percentage
FROM users
GROUP BY shard_prefix
ORDER BY record_count DESC
LIMIT 20;

-- Find potential hotspot keys (high-activity entities)
SELECT 
    user_id,
    COUNT(*) as activity_count
FROM user_activities
WHERE created_at > NOW() - INTERVAL '1 hour'
GROUP BY user_id
HAVING COUNT(*) > (
    SELECT AVG(count) * 10 
    FROM (
        SELECT COUNT(*) as count 
        FROM user_activities 
        WHERE created_at > NOW() - INTERVAL '1 hour'
        GROUP BY user_id
    ) t
)
ORDER BY activity_count DESC;

-- Check partition sizes (PostgreSQL partitioned table)
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
WHERE tablename LIKE 'events_partition_%'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
-- Detect uneven distribution in sharded setup
SELECT 
    SUBSTRING(user_id, 1, 2) as shard_prefix,
    COUNT(*) as record_count,
    COUNT(*) / (SELECT COUNT(*) FROM users) * 100 as percentage
FROM users
GROUP BY shard_prefix
ORDER BY record_count DESC
LIMIT 20;

-- Find hotspot keys by activity
SELECT 
    user_id,
    COUNT(*) as activity_count,
    COUNT(*) / (
        SELECT AVG(cnt) FROM (
            SELECT COUNT(*) as cnt 
            FROM user_activities 
            WHERE created_at > NOW() - INTERVAL 1 HOUR
            GROUP BY user_id
        ) t
    ) as hotspot_factor
FROM user_activities
WHERE created_at > NOW() - INTERVAL 1 HOUR
GROUP BY user_id
HAVING hotspot_factor > 10
ORDER BY activity_count DESC;

-- Check partition sizes (MySQL 8.0+)
SELECT 
    TABLE_NAME,
    PARTITION_NAME,
    TABLE_ROWS,
    DATA_LENGTH/1024/1024 as data_size_mb
FROM information_schema.PARTITIONS
WHERE TABLE_SCHEMA = DATABASE()
AND PARTITION_NAME IS NOT NULL
ORDER BY DATA_LENGTH DESC;
-- Detect uneven distribution
SELECT 
    LEFT(CAST(user_id AS VARCHAR(20)), 2) as shard_prefix,
    COUNT(*) as record_count,
    CAST(COUNT(*) AS FLOAT) / (SELECT COUNT(*) FROM users) * 100 as percentage
FROM users
GROUP BY LEFT(CAST(user_id AS VARCHAR(20)), 2)
ORDER BY record_count DESC;

-- Find hotspot keys
SELECT TOP 20
    user_id,
    COUNT(*) as activity_count,
    CAST(COUNT(*) AS FLOAT) / AVG(avg_count) OVER() as hotspot_factor
FROM user_activities a
CROSS JOIN (
    SELECT AVG(cnt) as avg_count
    FROM (
        SELECT COUNT(*) as cnt
        FROM user_activities
        WHERE created_at > DATEADD(HOUR, -1, GETDATE())
        GROUP BY user_id
    ) t
) avg_table
WHERE created_at > DATEADD(HOUR, -1, GETDATE())
GROUP BY user_id
HAVING COUNT(*) > AVG(avg_count) * 10
ORDER BY activity_count DESC;

-- Check partition function distribution
SELECT 
    p.partition_number,
    prv.value as boundary_value,
    p.rows
FROM sys.partitions p
JOIN sys.partition_range_values prv 
    ON p.partition_id = prv.partition_id
WHERE p.object_id = OBJECT_ID('dbo.events')
AND p.index_id = 1
ORDER BY p.partition_number;

Prevention & Mitigation Best Practices

  1. High-Cardinality Keys: Use hashed or composite keys for even distribution
  2. Avoid Time-Based: Don't use current timestamps as leading partition key
  3. Monitor Distribution: Track partition sizes and access patterns
  4. Plan for Celebrities: Design for power-law distributions
  5. Composite Strategies: Combine multiple attributes for partition keys
  6. Pre-Splitting: Create partitions ahead of data for even distribution
  7. Regular Rebalancing: Plan for redistribution as patterns change

Real World Examples

-- Problem: All current writes go to one partition
CREATE TABLE events (
    event_time TIMESTAMP,
    user_id BIGINT,
    data JSONB
) PARTITION BY RANGE (event_time);

-- Today's partition gets 100% of writes!
CREATE TABLE events_2024_01 PARTITION OF events 
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
-- Problem: Popular users create severe skew
CREATE TABLE posts (
    post_id BIGINT,
    user_id BIGINT,
    content TEXT
) PARTITION BY HASH (user_id);

-- Celebrity with 10M followers overwhelms one shard
-- while other shards sit idle
-- Solution: Combine time bucketing with hash distribution
CREATE TABLE events (
    event_id BIGINT,
    event_time TIMESTAMP,
    user_id BIGINT,
    data JSONB,
    -- Composite shard key for even distribution
    shard_key INT GENERATED ALWAYS AS (
        -- Time bucket (hourly) + user hash
        (EXTRACT(EPOCH FROM event_time)::BIGINT / 3600 + 
         hashtext(user_id::text)) % 1000
    ) STORED
) PARTITION BY LIST (shard_key);

-- Pre-create 1000 partitions for even distribution
DO $
BEGIN
    FOR i IN 0..999 LOOP
        EXECUTE format(
            'CREATE TABLE events_p%s PARTITION OF events FOR VALUES IN (%s)',
            i, i
        );
    END LOOP;
END $;

-- Now writes distribute evenly across all partitions
-- regardless of time or user popularity

AI Coding Guidance/Prompt

Prompt: "When designing data partitioning strategies:"
Rules:
  - Require distribution analysis for partition keys
  - Flag time-based or low-cardinality partition keys
  - Suggest composite keys for better distribution
  - Warn about celebrity/hotspot scenarios
  
Example:
  # Bad: Time-based partitioning creating hotspots
  CREATE TABLE events (
    event_time TIMESTAMP,
    user_id BIGINT,
    data JSONB
  ) PARTITION BY RANGE (event_time);
  -- All writes go to current partition!
  
  # Bad: Natural key with skew
  CREATE TABLE user_activities (
    user_id BIGINT,
    activity_time TIMESTAMP
  ) PARTITION BY HASH (user_id);
  -- Celebrity users create hotspots!
  
  # Good: Composite key for distribution
  CREATE TABLE user_activities (
    user_id BIGINT,
    activity_time TIMESTAMP,
    shard_key INT GENERATED ALWAYS AS (
        EXTRACT(EPOCH FROM activity_time)::INT % 100
    ) STORED
  ) PARTITION BY HASH (user_id, shard_key);
  
  # Good: Pre-split with monitoring
  CREATE TABLE events (
    event_id BIGINT,
    event_time TIMESTAMP,
    bucket_id INT GENERATED ALWAYS AS (
        (EXTRACT(EPOCH FROM event_time)::BIGINT / 3600 + 
         hashtext(event_id::text)) % 1000
    ) STORED
  ) PARTITION BY LIST (bucket_id);

Relevant Keywords

poor data partitioning hotspot keys Symptoms: slow queries, data inconsistency, constraint violations Preventive: schema validation, constraint enforcement, proper typing Tech stack: PostgreSQL, MySQL, SQL Server, Oracle Industry: all industries, enterprise, SaaS

Related Patterns

The Cantorian Technical Debt Magnitude scale gives developers an intuitive sense of magnitude beyond simple hour counts - some debts aren't just larger in scale, but qualitatively different in their complexity.

Cantor Points are critical decision junctures—or even moments of non-decision—where seemingly small choices can create drastically divergent futures for a system's integrity, security, and evolvability. These are the "forks in the road" where one path might lead to manageable complexity, while another veers towards systemic entanglement or even chaos. They often appear trivial at the time but can set in motion irreversible or costly-to-reverse consequences.

Applied Data Integrity (ADI) is a framework to understanding the far-reaching consequences of schema and data decisions that impact security and reliability, and accumulate into ethical debt that affects real human lives. Built on research from real-world incidents, ADI uncovered 7 Principles to identify when these decisions are being made, and how to make them better, to avoid future technical debt and potentially catastrophic "butterfly effects" of small decisions that ripple into chaotic technical and ethical debt.