SWE-3001: Unbounded Social Graph Queries

Description

Implementing social relationships using pure relational joins without considering the exponential growth of multi-hop queries. Simple adjacency list tables work for direct connections but fail catastrophically when calculating second or third-degree relationships, as query complexity grows geometrically with each additional hop.

Illustrative Cantor Point

The Cantor Point occurs when deciding to model social relationships as a simple two-column table (user_id, friend_id), thinking "we'll just join the table to itself for friend-of-friend queries." This seemingly logical decision fails to account for the exponential growth in computational complexity.

Categories: CP-Schema, CP-API, CP-Process

Real-World Examples / Observed In

Friendster: Calculating 3rd-degree connections brought the site to its knees (40-second page loads)
Early MySpace: "Top 8 Friends" feature created massive database load
LinkedIn: Had to implement specialized infrastructure for "degrees of separation"

Common Consequences & Impacts

Technical Impacts

- O(n^k) query complexity for k-degree relationships
- Database CPU saturation
- Memory exhaustion from large result sets
- Query timeouts on moderate friend counts

Human/Ethical Impacts

- Lost social connections
- Communities destroyed by platform failure
- Years of shared content inaccessible
- Network effects reversed

Business Impacts

- Platform failure under load
- User exodus to competitors
- Feature limitations
- Infrastructure costs explosion

Recovery Difficulty & Escalation

Recovery Difficulty:

9

Escalation Risk:

9

ADI Principles & Axioms Violated

Principle of Emergent Transformation: Linear thinking fails at exponential scale
Principle of Cascading Catastrophes: One bad query can topple the entire platform
Principle of Fragile Stability: Works fine until it suddenly doesn't

Detection / 60-Second Audit

-- Test second-degree connection query performance
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(DISTINCT f2.friend_id) as second_degree_count
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
WHERE f1.user_id = 12345  -- test user
AND f2.friend_id != f1.user_id
AND f2.friend_id NOT IN (
    SELECT friend_id FROM friendships WHERE user_id = 12345
);

-- Check for users with high friend counts (potential hotspots)
SELECT 
    user_id,
    COUNT(*) as friend_count,
    pg_size_pretty(pg_relation_size('friendships')) as table_size
FROM friendships
GROUP BY user_id
HAVING COUNT(*) > 1000
ORDER BY friend_count DESC
LIMIT 10;

-- Test second-degree connection query performance
EXPLAIN
SELECT COUNT(DISTINCT f2.friend_id) as second_degree_count
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
WHERE f1.user_id = 12345  -- test user
AND f2.friend_id != f1.user_id
AND f2.friend_id NOT IN (
    SELECT friend_id FROM friendships WHERE user_id = 12345
);

-- Check for users with high friend counts
SELECT 
    user_id,
    COUNT(*) as friend_count,
    (SELECT COUNT(*) FROM friendships) as total_friendships
FROM friendships
GROUP BY user_id
HAVING COUNT(*) > 1000
ORDER BY friend_count DESC
LIMIT 10;

-- Test second-degree connection query performance with execution plan
SET STATISTICS TIME ON;
SET STATISTICS IO ON;

SELECT COUNT(DISTINCT f2.friend_id) as second_degree_count
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
WHERE f1.user_id = 12345  -- test user
AND f2.friend_id != f1.user_id
AND f2.friend_id NOT IN (
    SELECT friend_id FROM friendships WHERE user_id = 12345
);

-- Check for users with high friend counts
SELECT TOP 10
    user_id,
    COUNT(*) as friend_count,
    CAST(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM friendships) as DECIMAL(5,2)) as pct_of_total
FROM friendships
GROUP BY user_id
HAVING COUNT(*) > 1000
ORDER BY friend_count DESC;

Prevention & Mitigation Best Practices

Depth Limits: Never calculate beyond 2nd degree in real-time
Precomputation: Materialize common relationship paths
Graph-Appropriate Storage: Consider Neo4j for true graph operations
Caching Strategy: Aggressively cache relationship calculations
Approximation Algorithms: Use sampling for large-degree estimates
Async Processing: Calculate expensive relationships in background
Denormalization: Store friend-of-friend counts rather than computing

Real World Examples

-- Naive approach that killed Friendster
SELECT DISTINCT f3.friend_id
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
JOIN friendships f3 ON f2.friend_id = f3.user_id
WHERE f1.user_id = 12345
AND f3.friend_id NOT IN (
    SELECT friend_id FROM friendships WHERE user_id = 12345
);
-- Result: 40+ second queries with just 5000 users

-- Celebrity with 1M followers crashes the system
SELECT COUNT(*)
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
WHERE f1.user_id = 1;  -- Celebrity user
-- Query attempts to process 1M x average_friends_per_user rows

-- Direct friends (real-time, indexed)
CREATE INDEX idx_friendships_user ON friendships(user_id);
CREATE INDEX idx_friendships_friend ON friendships(friend_id);

-- Second-degree connections (materialized view, refreshed hourly)
CREATE MATERIALIZED VIEW user_second_degree AS
SELECT 
    f1.user_id,
    f2.friend_id as second_degree_friend,
    COUNT(*) as mutual_friends
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
WHERE f2.friend_id != f1.user_id
AND NOT EXISTS (
    SELECT 1 FROM friendships 
    WHERE user_id = f1.user_id 
    AND friend_id = f2.friend_id
)
GROUP BY f1.user_id, f2.friend_id;

CREATE INDEX idx_second_degree_user ON user_second_degree(user_id);

-- Query now uses pre-computed data
SELECT second_degree_friend, mutual_friends
FROM user_second_degree
WHERE user_id = 12345
ORDER BY mutual_friends DESC
LIMIT 100;

AI Coding Guidance/Prompt

Prompt: "When implementing social graph features:"
Rules:
  - Reject real-time queries beyond 2nd degree connections
  - Require caching strategy for any graph traversal
  - Consider graph database for >3 relationship types
  - Flag O(n^k) complexity patterns
  
Example:
  # Bad: Unbounded recursive query
  WITH RECURSIVE friend_network AS (
    SELECT user_id, friend_id, 1 as depth
    FROM friendships WHERE user_id = ?
    UNION ALL
    SELECT fn.user_id, f.friend_id, fn.depth + 1
    FROM friend_network fn
    JOIN friendships f ON fn.friend_id = f.user_id
    WHERE fn.depth < 6  -- This will destroy your database
  )
  SELECT * FROM friend_network;
  
  # Good: Bounded and cached approach
  -- Direct friends (real-time)
  CREATE VIEW user_friends AS
  SELECT user_id, friend_id FROM friendships;
  
  -- Second-degree (materialized hourly)
  CREATE MATERIALIZED VIEW user_friends_of_friends AS
  SELECT DISTINCT f1.user_id, f2.friend_id
  FROM friendships f1
  JOIN friendships f2 ON f1.friend_id = f2.user_id
  WHERE f2.friend_id != f1.user_id;
  
  -- Beyond second-degree: Use graph database or approximation

Relevant Keywords

unbounded social graph queries Symptoms: slow queries, data inconsistency, constraint violations Preventive: schema validation, constraint enforcement, proper typing Tech stack: PostgreSQL, MySQL, SQL Server, Oracle Industry: all industries, enterprise, SaaS