SWE-5001

API Rate Limiting and Enumeration Weakness

Cantorian Technical Debt Magnitude: 2^ℵ₀ (Chaotic)

Description

APIs that lack proper rate limiting, use predictable identifiers (sequential IDs), and don't verify authorization for each request. This allows attackers to systematically enumerate and extract entire datasets by simply iterating through ID ranges. Often combined with Insecure Direct Object Reference (IDOR) vulnerabilities where authorization checks are missing entirely.

Illustrative Cantor Point

The Cantor Point occurs when designing API access patterns - choosing between simple, direct ID-based access versus implementing proper authentication, authorization, and rate limiting. The decision to use sequential IDs without access controls creates a divergent path where a convenience feature becomes a massive data exposure vector.

Categories: CP-Schema, CP-API, CP-Process

Real-World Examples / Observed In

  • First American Financial (2019): 885 million records exposed via sequential document IDs in URLs without authentication [See: Cases-By-Year/2019 Data Integrity Failures.md#2]
  • Facebook Contact Importer (2019): 533 million profiles scraped through bulk phone number queries without rate limiting [See: Cases-By-Year/2019 Data Integrity Failures.md#6]
  • Common Pattern: REST APIs exposing /api/resource/{id} without checking if requester owns that resource

Common Consequences & Impacts

Technical Impacts

  • - Complete database extraction via enumeration
  • - API abuse and resource exhaustion
  • - Inability to track or stop ongoing attacks
  • - Difficulty retrofitting security controls

Human/Ethical Impacts

  • - Privacy violations at scale
  • - Identity theft enablement
  • - Financial records exposure
  • - Personal safety risks

Business Impacts

  • - Massive data breaches
  • - Regulatory violations
  • - API shutdown requirements
  • - Competitive disadvantage from data theft

Recovery Difficulty & Escalation

3
9

ADI Principles & Axioms Violated

  • Principle of Observable Integrity: Cannot detect enumeration without monitoring
  • Principle of Trust Insufficiency: Trusting that IDs won't be guessed

Detection / 60-Second Audit

```sql
-- Check for sequential ID usage in APIs
SELECT 
    table_name,
    column_name,
    data_type,
    column_default
FROM information_schema.columns
WHERE table_schema = 'public'
AND column_name IN ('id', 'document_id', 'resource_id')
AND (data_type IN ('integer', 'bigint', 'serial', 'bigserial')
     OR column_default LIKE 'nextval%');

-- Look for missing API rate limiting tables
SELECT 
    NOT EXISTS (
        SELECT 1 FROM information_schema.tables
        WHERE table_name IN ('api_rate_limits', 'rate_limit_buckets', 'api_throttle')
    ) as missing_rate_limiting;

-- Check for API access logging
SELECT 
    EXISTS (
        SELECT 1 FROM information_schema.tables
        WHERE table_name LIKE '%api%log%' 
           OR table_name LIKE '%access%log%'
    ) as has_api_logging;

-- Identify potential IDOR vulnerabilities
SELECT 
    tc.table_name,
    kcu.column_name as id_column,
    CASE 
        WHEN NOT EXISTS (
            SELECT 1 FROM information_schema.columns
            WHERE table_name = tc.table_name
            AND column_name IN ('user_id', 'owner_id', 'created_by')
        ) THEN 'No ownership tracking - IDOR risk'
        ELSE 'Has ownership columns'
    END as idor_risk
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu 
    ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'PRIMARY KEY'
AND tc.table_schema = 'public';
```
```sql
-- Check for sequential ID usage in APIs
SELECT 
    table_name,
    column_name,
    data_type,
    column_default,
    extra
FROM information_schema.columns
WHERE table_schema = DATABASE()
AND column_name IN ('id', 'document_id', 'resource_id')
AND (data_type IN ('int', 'bigint')
     OR extra = 'auto_increment');

-- Look for missing API rate limiting tables
SELECT 
    NOT EXISTS (
        SELECT 1 FROM information_schema.tables
        WHERE table_schema = DATABASE()
        AND table_name IN ('api_rate_limits', 'rate_limit_buckets', 'api_throttle')
    ) as missing_rate_limiting;

-- Check for API access logging
SELECT 
    EXISTS (
        SELECT 1 FROM information_schema.tables
        WHERE table_schema = DATABASE()
        AND (table_name LIKE '%api%log%' 
           OR table_name LIKE '%access%log%')
    ) as has_api_logging;

-- Identify potential IDOR vulnerabilities
SELECT 
    t.table_name,
    kcu.column_name as id_column,
    CASE 
        WHEN NOT EXISTS (
            SELECT 1 FROM information_schema.columns
            WHERE table_schema = DATABASE()
            AND table_name = t.table_name
            AND column_name IN ('user_id', 'owner_id', 'created_by')
        ) THEN 'No ownership tracking - IDOR risk'
        ELSE 'Has ownership columns'
    END as idor_risk
FROM information_schema.tables t
JOIN information_schema.key_column_usage kcu 
    ON t.table_name = kcu.table_name
WHERE kcu.constraint_name = 'PRIMARY'
AND t.table_schema = DATABASE();
```
```sql
-- Check for sequential ID usage in APIs
SELECT 
    t.name as table_name,
    c.name as column_name,
    ty.name as data_type,
    c.is_identity
FROM sys.columns c
JOIN sys.tables t ON c.object_id = t.object_id
JOIN sys.types ty ON c.user_type_id = ty.user_type_id
WHERE c.name IN ('id', 'document_id', 'resource_id')
AND (ty.name IN ('int', 'bigint') OR c.is_identity = 1);

-- Look for missing API rate limiting tables
SELECT 
    CASE WHEN NOT EXISTS (
        SELECT 1 FROM sys.tables
        WHERE name IN ('api_rate_limits', 'rate_limit_buckets', 'api_throttle')
    ) THEN 1 ELSE 0 END as missing_rate_limiting;

-- Check for API access logging
SELECT 
    CASE WHEN EXISTS (
        SELECT 1 FROM sys.tables
        WHERE name LIKE '%api%log%' 
           OR name LIKE '%access%log%'
    ) THEN 1 ELSE 0 END as has_api_logging;

-- Identify potential IDOR vulnerabilities
SELECT 
    t.name as table_name,
    kc.name as id_column,
    CASE 
        WHEN NOT EXISTS (
            SELECT 1 FROM sys.columns
            WHERE object_id = t.object_id
            AND name IN ('user_id', 'owner_id', 'created_by')
        ) THEN 'No ownership tracking - IDOR risk'
        ELSE 'Has ownership columns'
    END as idor_risk
FROM sys.tables t
JOIN sys.indexes i ON t.object_id = i.object_id
JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
JOIN sys.columns kc ON ic.object_id = kc.object_id AND ic.column_id = kc.column_id
WHERE i.is_primary_key = 1;

Prevention & Mitigation Best Practices

  1. Use Non-Sequential Identifiers:

    -- Bad: Sequential IDs enable enumeration
    CREATE TABLE documents (
        id SERIAL PRIMARY KEY,
        content TEXT
    );
    
    -- Good: UUIDs prevent enumeration
    CREATE TABLE documents (
        id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
        content TEXT,
        owner_id UUID NOT NULL,
        created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
    );
    
  2. Implement Rate Limiting:

    CREATE TABLE api_rate_limits (
        id SERIAL PRIMARY KEY,
        user_id UUID,
        ip_address INET,
        endpoint VARCHAR(255),
        window_start TIMESTAMP WITH TIME ZONE,
        request_count INTEGER DEFAULT 1,
        UNIQUE(user_id, ip_address, endpoint, window_start)
    );
    
    -- Function to check rate limits
    CREATE OR REPLACE FUNCTION check_rate_limit(
        p_user_id UUID,
        p_ip INET,
        p_endpoint VARCHAR,
        p_limit INTEGER,
        p_window_minutes INTEGER
    ) RETURNS BOOLEAN AS $
    DECLARE
        v_count INTEGER;
        v_window_start TIMESTAMP WITH TIME ZONE;
    BEGIN
        v_window_start := date_trunc('minute', NOW() - INTERVAL '1 minute' * (EXTRACT(MINUTE FROM NOW())::INTEGER % p_window_minutes));
        
        INSERT INTO api_rate_limits (user_id, ip_address, endpoint, window_start)
        VALUES (p_user_id, p_ip, p_endpoint, v_window_start)
        ON CONFLICT (user_id, ip_address, endpoint, window_start)
        DO UPDATE SET request_count = api_rate_limits.request_count + 1
        RETURNING request_count INTO v_count;
        
        RETURN v_count <= p_limit;
    END;
    $ LANGUAGE plpgsql;
    
  3. Implement Authorization Checks:

    -- Always verify ownership
    CREATE OR REPLACE FUNCTION can_access_document(
        p_user_id UUID,
        p_document_id UUID
    ) RETURNS BOOLEAN AS $
    BEGIN
        RETURN EXISTS (
            SELECT 1 FROM documents
            WHERE id = p_document_id
            AND (owner_id = p_user_id 
                 OR EXISTS (
                     SELECT 1 FROM document_shares
                     WHERE document_id = p_document_id
                     AND shared_with_user_id = p_user_id
                 ))
        );
    END;
    $ LANGUAGE plpgsql;
    
  4. API Access Pattern Monitoring:

    CREATE TABLE api_access_patterns (
        id BIGSERIAL PRIMARY KEY,
        user_id UUID,
        ip_address INET,
        endpoint VARCHAR(255),
        accessed_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
        resource_ids TEXT[],
        pattern_score INTEGER -- Higher = more suspicious
    );
    
    -- Detect enumeration patterns
    CREATE VIEW suspicious_api_activity AS
    SELECT 
        user_id,
        ip_address,
        COUNT(*) as request_count,
        COUNT(DISTINCT unnest(resource_ids)) as unique_resources,
        MAX(pattern_score) as max_pattern_score
    FROM api_access_patterns
    WHERE accessed_at > NOW() - INTERVAL '1 hour'
    GROUP BY user_id, ip_address
    HAVING COUNT(*) > 100 -- High volume
        OR COUNT(DISTINCT unnest(resource_ids)) > 50; -- Many different resources
    
  5. Additional Best Practices:

    • Implement CAPTCHA for high-volume requests
    • Use API keys with scope limitations
    • Deploy Web Application Firewalls (WAF)
    • Regular penetration testing focusing on IDOR
    • Implement circuit breakers for suspicious patterns

Real World Examples

# The vulnerability:
GET https://docutech-prod.firstam.com/document/view/12345
Response: Full document with sensitive financial data

# How it was exploited:
- No authentication required
- Sequential IDs allowed enumeration
- Simple script could download millions:
for i in range(1, 1000000):
    url = f"https://docutech-prod.firstam.com/document/view/{i}"
    download(url)  # 885 million documents exposed

# Impact:
- Mortgage documents, bank statements, tax records
- Social Security numbers, wire transfer receipts
- Active from 2003 to 2019 (16 years!)
# The vulnerability:
POST /v2.0/me/invitable_friends
Body: { "phone_numbers": ["+1234567890", "+1234567891", ...] }
Response: { "data": [{ "id": "100001234", "name": "John Doe" }, ...] }

# How it was exploited:
- No rate limiting on bulk queries
- Could submit millions of phone numbers
- Returned Facebook profiles for each number
- Scrapers harvested 533 million profiles

# Timeline:
- Vulnerability introduced: Unknown
- Discovered and exploited: 2019
- Patched: August 2019
- Data appeared on hacking forums: 2021
# Before (vulnerable):
@app.route('/api/documents/<int:doc_id>')
def get_document(doc_id):
    doc = db.query(f"SELECT * FROM documents WHERE id = {doc_id}")
    return jsonify(doc)

# After (secure):
@app.route('/api/documents/<uuid:doc_id>')
@require_auth
@rate_limit(100, per='hour')
def get_document(doc_id):
    # Check ownership
    if not can_access_document(current_user.id, doc_id):
        abort(404)  # Always 404, never 403
    
    # Log access for pattern detection
    log_api_access(current_user.id, request.remote_addr, doc_id)
    
    # Check for enumeration patterns
    if detect_enumeration_pattern(current_user.id):
        alert_security_team()
        abort(429)  # Rate limit response
    
    doc = db.query(
        "SELECT * FROM documents WHERE id = %s AND owner_id = %s",
        (doc_id, current_user.id)
    )
    return jsonify(doc)

AI Coding Guidance/Prompt

Prompt: "When designing APIs that access data:"
Rules:
  - Never use sequential IDs for public APIs
  - Always implement rate limiting from day one
  - Require authorization checks for every data access
  - Log all API access with enough detail to detect patterns
  - Flag any API that returns data based solely on an ID parameter
  
Example:
  # Bad: Direct ID access without controls
  GET /api/documents/12345
  Response: { "id": 12345, "content": "sensitive data" }
  
  # Good: Proper API security
  GET /api/documents/550e8400-e29b-41d4-a716-446655440000
  Headers: 
    Authorization: Bearer <token>
    X-API-Key: <key>
  
  Response: 
    - 200 OK if user owns document
    - 404 Not Found if document doesn't exist OR user lacks access
    - 429 Too Many Requests if rate limit exceeded
  
  # Implementation example
  async function getDocument(req, res) {
    // Rate limiting
    if (!await checkRateLimit(req.user.id, req.ip)) {
      return res.status(429).json({ error: "Rate limit exceeded" });
    }
    
    // Authorization
    const documentId = req.params.id;
    if (!await canAccessDocument(req.user.id, documentId)) {
      return res.status(404).json({ error: "Not found" });
    }
    
    // Logging
    await logApiAccess(req.user.id, req.ip, 'GET /api/documents', documentId);
    
    // Return data
    const document = await fetchDocument(documentId);
    return res.json(document);
  }

Relevant Keywords

rate limiting enumeration weakness Symptoms: slow queries, data inconsistency, constraint violations Preventive: schema validation, constraint enforcement, proper typing Tech stack: PostgreSQL, MySQL, SQL Server, Oracle Industry: all industries, enterprise, SaaS

Related Patterns

The Cantorian Technical Debt Magnitude scale gives developers an intuitive sense of magnitude beyond simple hour counts - some debts aren't just larger in scale, but qualitatively different in their complexity.

Cantor Points are critical decision junctures—or even moments of non-decision—where seemingly small choices can create drastically divergent futures for a system's integrity, security, and evolvability. These are the "forks in the road" where one path might lead to manageable complexity, while another veers towards systemic entanglement or even chaos. They often appear trivial at the time but can set in motion irreversible or costly-to-reverse consequences.

Applied Data Integrity (ADI) is a framework to understanding the far-reaching consequences of schema and data decisions that impact security and reliability, and accumulate into ethical debt that affects real human lives. Built on research from real-world incidents, ADI uncovered 7 Principles to identify when these decisions are being made, and how to make them better, to avoid future technical debt and potentially catastrophic "butterfly effects" of small decisions that ripple into chaotic technical and ethical debt.