Unbounded Growth (No Data Archiving)
Description
Allowing tables to grow indefinitely without archival or cleanup strategies. Common with logs, audit records, and historical data where the assumption is "we might need it someday." Results in ever-growing tables that degrade performance, complicate maintenance, and eventually threaten system stability.
Illustrative Cantor Point
The Cantor Point occurs during initial design when deciding "we'll keep everything forever" or "we'll implement cleanup later." This deferral of data lifecycle planning creates an ever-growing burden that becomes harder to address as data accumulates.
Real-World Examples / Observed In
- SaaS Platform: Audit log table grew to 500 million rows over 3 years, making queries timeout
- E-commerce Site: Order history table became so large that backups couldn't complete in maintenance windows
- Monitoring System: Metrics table grew until INSERT performance degraded, losing current data
Common Consequences & Impacts
Technical Impacts
- - Query performance degradation
- - Backup/restore times exceed windows
- - Index maintenance becomes prohibitive
- - Storage costs explosion
Human/Ethical Impacts
- - User experience degradation
- - Engineering burnout from operational issues
- - Data hoarding without purpose
- - Environmental impact of unnecessary storage
Business Impacts
- - Increased operational costs
- - Extended maintenance windows
- - Feature development slowed
- - Risk of data loss from failed backups
Recovery Difficulty & Escalation
ADI Principles & Axioms Violated
- Principle of Invisible Decay: Growth is gradual until it's catastrophic
- Principle of Fragile Stability: Works until it suddenly doesn't
- Principle of Consequential Stewardship: Keeping all data isn't always ethical
Detection / 60-Second Audit
-- Find largest tables
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,
pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) as table_size,
pg_size_pretty(pg_indexes_size(schemaname||'.'||tablename)) as indexes_size,
(SELECT COUNT(*) FROM information_schema.columns
WHERE table_schema = schemaname AND table_name = tablename) as column_count
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;
-- Check table growth rate
WITH table_sizes AS (
SELECT
current_date as measurement_date,
tablename,
pg_total_relation_size(schemaname||'.'||tablename) as size_bytes
FROM pg_tables
WHERE schemaname = 'public'
)
SELECT
tablename,
pg_size_pretty(size_bytes) as current_size,
pg_size_pretty(size_bytes::bigint / 30) as daily_growth_estimate
FROM table_sizes
WHERE size_bytes > 1073741824 -- Tables over 1GB
ORDER BY size_bytes DESC;
-- Find tables with old data
SELECT
table_name,
column_name,
'SELECT MIN(' || column_name || '), MAX(' || column_name ||
') FROM ' || table_name || ';' as check_query
FROM information_schema.columns
WHERE table_schema = 'public'
AND data_type IN ('timestamp', 'date')
AND (column_name LIKE '%created%' OR column_name LIKE '%date%');
-- Find largest tables
SELECT
TABLE_SCHEMA,
TABLE_NAME,
ROUND(((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024), 2) AS total_size_mb,
ROUND((DATA_LENGTH / 1024 / 1024), 2) AS data_size_mb,
ROUND((INDEX_LENGTH / 1024 / 1024), 2) AS index_size_mb,
TABLE_ROWS
FROM information_schema.TABLES
WHERE TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC
LIMIT 20;
-- Estimate growth rate (requires periodic snapshots)
SELECT
TABLE_NAME,
TABLE_ROWS,
AUTO_INCREMENT,
ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024, 2) as size_mb,
ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024 / 30, 2) as daily_growth_estimate_mb
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE()
AND (DATA_LENGTH + INDEX_LENGTH) > 1073741824
ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC;
-- Generate queries to check date ranges
SELECT
TABLE_NAME,
COLUMN_NAME,
CONCAT('SELECT MIN(', COLUMN_NAME, '), MAX(', COLUMN_NAME,
') FROM ', TABLE_NAME, ';') as check_query
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = DATABASE()
AND DATA_TYPE IN ('timestamp', 'datetime', 'date')
AND (COLUMN_NAME LIKE '%created%' OR COLUMN_NAME LIKE '%date%');
-- Find largest tables
SELECT TOP 20
s.name AS SchemaName,
t.name AS TableName,
SUM(p.rows) AS RowCounts,
SUM(a.total_pages) * 8 / 1024 AS TotalSpaceMB,
SUM(a.used_pages) * 8 / 1024 AS UsedSpaceMB,
(SUM(a.total_pages) - SUM(a.used_pages)) * 8 / 1024 AS UnusedSpaceMB
FROM sys.tables t
INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
INNER JOIN sys.indexes i ON t.object_id = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
WHERE t.is_ms_shipped = 0
GROUP BY s.name, t.name
ORDER BY TotalSpaceMB DESC;
-- Check for old data in tables
SELECT
TABLE_NAME,
COLUMN_NAME,
'SELECT MIN(' + COLUMN_NAME + ') AS OldestRecord, MAX(' +
COLUMN_NAME + ') AS NewestRecord FROM ' +
TABLE_SCHEMA + '.' + TABLE_NAME + ';' as check_query
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'dbo'
AND DATA_TYPE IN ('datetime', 'datetime2', 'date')
AND (COLUMN_NAME LIKE '%created%' OR COLUMN_NAME LIKE '%date%');
Prevention & Mitigation Best Practices
- Define Retention Early: Establish data lifecycle policies from day one
- Partition by Time: Use date-based partitions for easy dropping
- Automated Archival: Regular jobs to move old data to cold storage
- Cascading Deletes: Ensure related data is cleaned up together
- Batch Deletions: Delete in small chunks to avoid lock issues
- Summary Tables: Keep aggregates instead of raw data
- Compliance Alignment: Match retention to legal requirements
Real World Examples
-- Problem: Audit logs growing forever
CREATE TABLE audit_logs (
id BIGSERIAL PRIMARY KEY,
created_at TIMESTAMP DEFAULT NOW(),
user_id BIGINT,
action VARCHAR(100),
details JSONB
);
-- After 3 years: 500M rows, 800GB, queries timeout
-- Backups take 6+ hours, blocking maintenance windows
-- Problem: User sessions never cleaned up
CREATE TABLE user_sessions (
session_id UUID PRIMARY KEY,
user_id BIGINT,
created_at TIMESTAMP DEFAULT NOW(),
last_activity TIMESTAMP,
data JSONB
);
-- Millions of dead sessions from years ago
-- Active session queries scan through ancient data
-- Solution: Time-based partitions with automated cleanup
CREATE TABLE audit_logs (
id BIGSERIAL,
created_at TIMESTAMP DEFAULT NOW(),
user_id BIGINT,
action VARCHAR(100),
details JSONB,
PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE audit_logs_2024_01 PARTITION OF audit_logs
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
-- Automated cleanup function
CREATE OR REPLACE FUNCTION cleanup_old_audit_logs()
RETURNS void AS $
DECLARE
cutoff_date DATE;
partition_name TEXT;
BEGIN
cutoff_date := CURRENT_DATE - INTERVAL '90 days';
FOR partition_name IN
SELECT tablename
FROM pg_tables
WHERE schemaname = 'public'
AND tablename ~ '^audit_logs_\d{4}_\d{2}{{{realWorldExamples}}}#039;
AND tablename < 'audit_logs_' || TO_CHAR(cutoff_date, 'YYYY_MM')
LOOP
-- Archive to S3/cold storage first if needed
EXECUTE format('DROP TABLE %I CASCADE', partition_name);
RAISE NOTICE 'Dropped old partition: %', partition_name;
END LOOP;
END;
$ LANGUAGE plpgsql;
-- Schedule weekly: SELECT cleanup_old_audit_logs();
-- For session cleanup, use TTL
CREATE INDEX idx_sessions_last_activity ON user_sessions(last_activity);
-- Regular cleanup job
DELETE FROM user_sessions
WHERE last_activity < NOW() - INTERVAL '30 days'
LIMIT 10000; -- Batch to avoid locks
AI Coding Guidance/Prompt
Prompt: "When designing tables that accumulate data over time:"
Rules:
- Require retention policy definition
- Suggest partitioning for time-series data
- Flag tables without archival strategy
- Recommend cleanup job creation
Example:
# Bad: Infinite growth table
CREATE TABLE audit_logs (
id BIGINT PRIMARY KEY,
created_at TIMESTAMP DEFAULT NOW(),
user_id BIGINT,
action TEXT,
details JSONB
);
-- No partition, no retention policy!
# Good: Partitioned with retention
CREATE TABLE audit_logs (
id BIGINT,
created_at TIMESTAMP DEFAULT NOW(),
user_id BIGINT,
action TEXT,
details JSONB
) PARTITION BY RANGE (created_at);
-- Monthly partitions
CREATE TABLE audit_logs_2024_01 PARTITION OF audit_logs
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
-- Automated cleanup job
CREATE OR REPLACE FUNCTION drop_old_audit_partitions()
RETURNS void AS $
DECLARE
partition_name TEXT;
BEGIN
FOR partition_name IN
SELECT tablename
FROM pg_tables
WHERE tablename LIKE 'audit_logs_%'
AND tablename < 'audit_logs_' ||
TO_CHAR(CURRENT_DATE - INTERVAL '6 months', 'YYYY_MM')
LOOP
EXECUTE format('DROP TABLE IF EXISTS %I', partition_name);
RAISE NOTICE 'Dropped partition: %', partition_name;
END LOOP;
END;
$ LANGUAGE plpgsql;
-- Schedule monthly: SELECT drop_old_audit_partitions();
Relevant Keywords
unbounded growth data archiving) Symptoms: slow queries, data inconsistency, constraint violations Preventive: schema validation, constraint enforcement, proper typing Tech stack: PostgreSQL, MySQL, SQL Server, Oracle Industry: all industries, enterprise, SaaS