Under-Normalization (God Tables)
Description
Creating "God tables" with dozens or hundreds of columns that mix multiple entity types or business concepts. Includes anti-patterns like repeated columns (phone1, phone2, phone3) and catch-all tables serving multiple purposes. Often starts as a shortcut but evolves into unmaintainable structures with extensive NULL values and complex conditional logic.
Illustrative Cantor Point
The Cantor Point occurs when deciding to "just add another column" instead of creating a proper relationship. Each time this convenience is chosen over proper design, the table grows wider and more unwieldy, eventually becoming impossible to refactor without massive effort.
Real-World Examples / Observed In
- CRM System: Customers table with 120 columns including multiple addresses, contacts, preferences, and historical data
- E-commerce Platform: Products table with 500+ columns trying to accommodate every possible product attribute
- Event System: Single events table with generic columns (int_field1, text_field1) for different event types
Common Consequences & Impacts
Technical Impacts
- - Excessive NULL values wasting storage
- - Wide rows impacting query performance
- - Update anomalies from redundant data
- - Schema changes affect entire application
Human/Ethical Impacts
- - Data quality issues affecting users
- - Developer frustration with complexity
- - Increased bugs from conditional logic
- - Technical debt impeding innovation
Business Impacts
- - Slow feature development
- - High risk of data inconsistency
- - Difficult onboarding for new developers
- - Resistance to necessary changes
Recovery Difficulty & Escalation
ADI Principles & Axioms Violated
- Principle of Cascading Catastrophes: Small conveniences accumulate into massive problems
- Principle of Invisible Decay: Table complexity grows slowly until it's unmanageable
- Principle of Consequential Stewardship: Poor structure leads to poor data quality
Detection / 60-Second Audit
-- Find suspiciously wide tables
SELECT
table_name,
COUNT(*) as column_count
FROM information_schema.columns
WHERE table_schema = 'public'
GROUP BY table_name
HAVING COUNT(*) > 30
ORDER BY column_count DESC;
-- Detect repeated column patterns
SELECT
table_name,
array_agg(column_name ORDER BY column_name) as suspicious_columns
FROM information_schema.columns
WHERE table_schema = 'public'
AND column_name ~ '.*[0-9]+{{{detection}}}#039; -- Ends with number
GROUP BY table_name
HAVING COUNT(*) > 2;
-- Find tables with high NULL percentage
SELECT
schemaname,
tablename,
attname,
null_frac
FROM pg_stats
WHERE schemaname = 'public'
AND null_frac > 0.5
ORDER BY null_frac DESC;
-- Find suspiciously wide tables
SELECT
TABLE_NAME,
COUNT(*) as column_count
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = DATABASE()
GROUP BY TABLE_NAME
HAVING COUNT(*) > 30
ORDER BY column_count DESC;
-- Detect repeated column patterns
SELECT
TABLE_NAME,
GROUP_CONCAT(COLUMN_NAME ORDER BY COLUMN_NAME) as suspicious_columns
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = DATABASE()
AND COLUMN_NAME REGEXP '[0-9]+{{{detection}}}#039; -- Ends with number
GROUP BY TABLE_NAME
HAVING COUNT(*) > 2;
-- Check for sparse columns (requires analyzing specific table)
SELECT
'Run this for each wide table:' as instruction,
CONCAT('SELECT COUNT(*) as total_rows, ',
'SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count ',
'FROM table_name;') as query_template;
-- Find suspiciously wide tables
SELECT
t.name AS table_name,
COUNT(*) as column_count
FROM sys.columns c
JOIN sys.tables t ON c.object_id = t.object_id
WHERE t.type = 'U'
GROUP BY t.name
HAVING COUNT(*) > 30
ORDER BY column_count DESC;
-- Detect repeated column patterns
SELECT
t.name AS table_name,
STRING_AGG(c.name, ', ') as suspicious_columns
FROM sys.columns c
JOIN sys.tables t ON c.object_id = t.object_id
WHERE t.type = 'U'
AND c.name LIKE '%[0-9]'
GROUP BY t.name
HAVING COUNT(*) > 2;
Prevention & Mitigation Best Practices
- One Entity, One Table: Each table represents exactly one business concept
- Relationship Recognition: Use separate tables for 1-to-many relationships
- Column Naming Discipline: Numbered columns (field1, field2) indicate design issues
- NULL Analysis: Many NULLs suggest mixed entity types
- Incremental Refactoring: Extract cohesive column groups into new tables
- Domain Modeling: Define clear boundaries between entities
- Code Review Focus: Question every column addition to wide tables
Real World Examples
-- God table trying to handle all customer data
CREATE TABLE customers (
id BIGINT PRIMARY KEY,
-- Basic info
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(255),
-- Multiple phones (anti-pattern)
phone1 VARCHAR(20),
phone1_type VARCHAR(10),
phone2 VARCHAR(20),
phone2_type VARCHAR(10),
phone3 VARCHAR(20),
phone3_type VARCHAR(10),
-- Multiple addresses (anti-pattern)
home_address VARCHAR(500),
home_city VARCHAR(100),
home_state VARCHAR(2),
home_zip VARCHAR(10),
work_address VARCHAR(500),
work_city VARCHAR(100),
work_state VARCHAR(2),
work_zip VARCHAR(10),
-- Mixed concerns (preferences)
pref_language VARCHAR(10),
pref_currency VARCHAR(3),
pref_timezone VARCHAR(50),
pref_newsletter BOOLEAN,
pref_sms BOOLEAN,
-- Order history (wrong place)
first_order_date DATE,
last_order_date DATE,
total_orders INT,
total_spent DECIMAL(10,2),
average_order_value DECIMAL(10,2),
-- And 80 more columns...
);
-- Generic "data" table anti-pattern
CREATE TABLE entity_data (
id BIGINT PRIMARY KEY,
entity_type VARCHAR(50), -- 'customer', 'product', 'order'
entity_id BIGINT,
-- Generic columns for any data type
string_val1 VARCHAR(255),
string_val2 VARCHAR(255),
string_val3 TEXT,
int_val1 INT,
int_val2 INT,
decimal_val1 DECIMAL(15,2),
decimal_val2 DECIMAL(15,2),
date_val1 DATE,
date_val2 DATE,
json_data JSON,
-- Metadata
created_at TIMESTAMP,
updated_at TIMESTAMP
);
-- Queries become nightmares of conditional logic
-- Properly normalized schema
CREATE TABLE customers (
id BIGINT PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(255) NOT NULL UNIQUE,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE customer_phones (
id BIGINT PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(id),
phone_type VARCHAR(20) NOT NULL,
phone_number VARCHAR(20) NOT NULL,
is_primary BOOLEAN DEFAULT FALSE,
UNIQUE(customer_id, phone_type)
);
CREATE TABLE customer_addresses (
id BIGINT PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(id),
address_type VARCHAR(20) NOT NULL,
street_address VARCHAR(255) NOT NULL,
city VARCHAR(100) NOT NULL,
state_province VARCHAR(100),
postal_code VARCHAR(20),
country_code CHAR(2) NOT NULL,
is_primary BOOLEAN DEFAULT FALSE,
UNIQUE(customer_id, address_type)
);
CREATE TABLE customer_preferences (
customer_id BIGINT PRIMARY KEY REFERENCES customers(id),
language_code VARCHAR(10) DEFAULT 'en',
currency_code VARCHAR(3) DEFAULT 'USD',
timezone VARCHAR(50) DEFAULT 'UTC',
newsletter_enabled BOOLEAN DEFAULT TRUE,
sms_enabled BOOLEAN DEFAULT FALSE
);
-- Order summary as materialized view, not denormalized columns
CREATE MATERIALIZED VIEW customer_order_stats AS
SELECT
c.id as customer_id,
MIN(o.created_at) as first_order_date,
MAX(o.created_at) as last_order_date,
COUNT(o.id) as total_orders,
SUM(o.total_amount) as total_spent,
AVG(o.total_amount) as average_order_value
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id;
AI Coding Guidance/Prompt
Prompt: "When adding columns to existing tables:"
Rules:
- Flag tables with >50 columns for review
- Reject numbered column names (e.g., address1, address2)
- Suggest separate tables for optional column groups
- Warn about mixing different entity types
Example:
# Bad: God table with everything
CREATE TABLE users (
id BIGINT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(255),
phone1 VARCHAR(20),
phone2 VARCHAR(20),
phone3 VARCHAR(20),
home_address VARCHAR(500),
work_address VARCHAR(500),
other_address VARCHAR(500),
preference_theme VARCHAR(50),
preference_language VARCHAR(10),
preference_notifications BOOLEAN,
last_order_id BIGINT,
last_order_date DATE,
total_spent DECIMAL(10,2),
-- ... 100 more columns
);
# Good: Properly separated entities
CREATE TABLE users (
id BIGINT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) NOT NULL UNIQUE
);
CREATE TABLE user_phones (
user_id BIGINT REFERENCES users(id),
phone_type VARCHAR(20),
phone_number VARCHAR(20),
PRIMARY KEY (user_id, phone_type)
);
CREATE TABLE user_addresses (
id BIGINT PRIMARY KEY,
user_id BIGINT REFERENCES users(id),
address_type VARCHAR(20),
street_address TEXT,
-- proper address fields
);
Relevant Keywords
under normalization (god tables) Symptoms: slow queries, data inconsistency, constraint violations Preventive: schema validation, constraint enforcement, proper typing Tech stack: PostgreSQL, MySQL, SQL Server, Oracle Industry: all industries, enterprise, SaaS