Database Indexing: The Performance Trick Nobody Explained Properly
Here’s a scenario I’ve seen dozens of times.
A developer builds an app. It works great in development with 100 users. Then it goes to production, gets 100,000 users, and suddenly everything is slow. Pages timeout. The database CPU is at 100%. Users are complaining.
The developer starts optimizing code. Caching everything. Adding more servers. Nothing helps.
Then someone looks at the database and says, Have you tried adding an index?
One line of SQL later, queries that took 30 seconds now take 30 milliseconds. Problem solved.
Database indexing is the single highest-leverage performance optimization most developers ignore. It’s not sexy. It’s not a new framework. But it can make your app 100x faster with almost no code changes.
This guide explains how indexes actually work, when to use them, and the mistakes that destroy performance. By the end, you’ll understand indexes well enough to speed up any slow query.
Let’s make your database fast.
What Is an Database Index?
An index is a data structure that makes finding data faster.
Think of it like a book index. If you want to find every mention of “authentication” in a 500-page book, you have two options:
- Without index: Read every page, looking for “authentication” (slow)
- With index: Look up “authentication” in the back, get page numbers, go directly there (fast)
Databases work the same way.
-- Without index: Database scans EVERY row
SELECT * FROM users WHERE email = 'john@example.com';
-- Checks row 1... nope
-- Checks row 2... nope
-- Checks row 3... nope
-- ... 999,997 more checks ...
-- Checks row 1,000,000... found it!
-- With index: Database jumps directly to matching rows
SELECT * FROM users WHERE email = 'john@example.com';
-- Look up 'john@example.com' in index → Row #847,293
-- Go directly to row #847,293 → Found it!
The difference? O(n) vs O(log n). For a million rows, that’s a million operations vs ~20 operations.
How Indexes Work (B-Tree)
Most database indexes use a B-tree (balanced tree) structure.
- How a B-tree lookup works:
- Start at root node
- Compare value, go left or right
- Repeat until you find the value
- Follow pointer to actual row data
Why it’s fast: Each step eliminates half the remaining data. With a million rows:
- Step 1: 1,000,000 → 500,000
- Step 2: 500,000 → 250,000
- …
- Step 20: Found it!
20 steps instead of 1,000,000. That’s the power of indexes.
Creating Indexes
Basic Index
-- Create index on single column
CREATE INDEX idx_users_email ON users(email);
-- Create index on multiple columns (composite)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Create unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);
When Indexes Are Created Automatically
-- Primary keys automatically get an index
CREATE TABLE users (
id SERIAL PRIMARY KEY, -- Index created automatically
email VARCHAR(255)
);
-- Unique constraints automatically get an index
ALTER TABLE users ADD CONSTRAINT unique_email UNIQUE (email);
-- Index created automatically
-- Foreign keys do NOT automatically get an index in PostgreSQL!
-- (They do in MySQL)
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id) -- NO automatic index!
);
-- You should create one manually
CREATE INDEX idx_orders_user_id ON orders(user_id);
Which Columns to Index
Always Index These
-- 1. Foreign keys (JOIN performance)
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_comments_post_id ON comments(post_id);
-- 2. Columns in WHERE clauses (filter performance)
CREATE INDEX idx_users_status ON users(status);
CREATE INDEX idx_products_category ON products(category_id);
-- 3. Columns in ORDER BY (sort performance)
CREATE INDEX idx_posts_created_at ON posts(created_at);
-- 4. Columns in GROUP BY (aggregation performance)
CREATE INDEX idx_orders_product_id ON orders(product_id);
Don’t Index These
-- 1. Tiny tables (full scan is fast enough)
-- A table with 100 rows doesn't need indexes beyond PK
-- 2. Columns with low cardinality (few unique values)
-- Boolean columns, status with only 3 values
CREATE INDEX idx_users_is_active ON users(is_active); -- Usually not helpful
-- 3. Columns that change frequently
-- Index must be updated on every change
CREATE INDEX idx_users_last_seen ON users(last_seen_at); -- Expensive to maintain
-- 4. Columns rarely used in queries
-- Indexes have storage cost and write overhead
The Selectivity Rule
An index is most useful when it narrows down results significantly.
-- HIGH selectivity (good for indexing)
SELECT * FROM users WHERE email = 'john@example.com';
-- Returns 1 row out of 1,000,000 = 0.0001% of table
-- LOW selectivity (index might not help)
SELECT * FROM users WHERE is_active = true;
-- Returns 950,000 rows out of 1,000,000 = 95% of table
-- Full table scan might actually be faster!
Composite Indexes (Multi-Column)
Composite indexes cover multiple columns. Order matters!
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
How Composite Index Order Works
Which Queries Can Use This Index?
-- ✅ Uses index (matches left-to-right)
SELECT * FROM orders WHERE user_id = 42;
SELECT * FROM orders WHERE user_id = 42 AND created_at > '2024-01-01';
SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at;
-- ❌ Cannot use index (skips first column)
SELECT * FROM orders WHERE created_at > '2024-01-01';
-- Would need separate index: CREATE INDEX idx_orders_date ON orders(created_at);
The Left-Prefix Rule
- A composite index can be used for queries that filter on:
- First column only ✅
- First + second columns ✅
- First + second + third columns ✅
- Second column only ❌
- Third column only ❌
-- Index: (a, b, c)
WHERE a = 1 -- ✅ Uses index
WHERE a = 1 AND b = 2 -- ✅ Uses index
WHERE a = 1 AND b = 2 AND c = 3 -- ✅ Uses index
WHERE b = 2 -- ❌ Cannot use index
WHERE c = 3 -- ❌ Cannot use index
WHERE b = 2 AND c = 3 -- ❌ Cannot use index
WHERE a = 1 AND c = 3 -- ⚠️ Partial: only uses 'a' part
EXPLAIN: See How Queries Execute
The EXPLAIN command shows you exactly how the database executes a query.
Basic Usage
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';
Output (PostgreSQL):
Seq Scan on users (cost=0.00..1834.00 rows=1 width=72)
Filter: (email = 'john@example.com')
Seq Scan = Sequential Scan = Full Table Scan = SLOW!
Now add an index:
CREATE INDEX idx_users_email ON users(email);
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';
Output:
Index Scan using idx_users_email on users (cost=0.42..8.44 rows=1 width=72)
Index Cond: (email = 'john@example.com')
Index Scan = Using the index = FAST!
EXPLAIN ANALYZE (With Actual Timing)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'john@example.com';
Output:
Index Scan using idx_users_email on users
(cost=0.42..8.44 rows=1 width=72)
(actual time=0.025..0.026 rows=1 loops=1)
Planning Time: 0.085 ms
Execution Time: 0.043 ms
Now you can see actual milliseconds!
What to Look For
Common Index Problems
Problem 1: Index Not Being Used
-- You created an index
CREATE INDEX idx_users_email ON users(email);
-- But this query doesn't use it!
EXPLAIN SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- Seq Scan!
Why? The index is on email, not LOWER(email).
Fix: Create a functional index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Now it works
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- Index Scan ✅
Problem 2: Wrong Column Order in Composite Index
-- Index created
CREATE INDEX idx_orders_date_user ON orders(created_at, user_id);
-- This query is slow!
SELECT * FROM orders WHERE user_id = 42;
-- Seq Scan! The index can't help (wrong order)
-- Fix: Create index with correct order
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
Problem 3: Too Many Indexes
-- Every index has a cost:
-- • Storage space
-- • Slower INSERT/UPDATE/DELETE (must update all indexes)
-- • Query planner overhead
-- Don't do this!
CREATE INDEX idx_users_a ON users(a);
CREATE INDEX idx_users_b ON users(b);
CREATE INDEX idx_users_c ON users(c);
CREATE INDEX idx_users_ab ON users(a, b);
CREATE INDEX idx_users_bc ON users(b, c);
CREATE INDEX idx_users_abc ON users(a, b, c);
-- 6 indexes to maintain on every write!
-- Instead, analyze your queries and create minimal covering indexes
Problem 4: Type Mismatch
-- Column is VARCHAR
CREATE TABLE users (
id SERIAL PRIMARY KEY,
phone VARCHAR(20)
);
CREATE INDEX idx_users_phone ON users(phone);
-- But you query with INTEGER
SELECT * FROM users WHERE phone = 1234567890;
-- Index might not be used! (type conversion)
-- Fix: Use correct type
SELECT * FROM users WHERE phone = '1234567890';
Problem 5: LIKE with Leading Wildcard
CREATE INDEX idx_users_name ON users(name);
-- This uses the index ✅
SELECT * FROM users WHERE name LIKE 'John%';
-- This CANNOT use the index ❌
SELECT * FROM users WHERE name LIKE '%John%';
-- Must scan entire index/table because we don't know starting point
-- For full-text search, use proper full-text indexes instead
CREATE INDEX idx_users_name_gin ON users USING gin(to_tsvector('english', name));
Index Types
Different index types for different use cases:
B-Tree (Default)
CREATE INDEX idx_users_email ON users(email);
-- Same as:
CREATE INDEX idx_users_email ON users USING btree(email);
- Best for: Equality (=) and range queries (<, >, BETWEEN)
- Use when: Most cases — this is your default choice
Hash
CREATE INDEX idx_users_email_hash ON users USING hash(email);
- Best for: Equality only (=)
- Use when: You only do exact matches, never ranges
GIN (Generalized Inverted Index)
-- For array columns
CREATE INDEX idx_posts_tags ON posts USING gin(tags);
-- For full-text search
CREATE INDEX idx_posts_content ON posts USING gin(to_tsvector('english', content));
-- For JSONB
CREATE INDEX idx_users_metadata ON users USING gin(metadata);
- Best for: Arrays, full-text search, JSONB
- Use when: You query inside complex data types
GiST (Generalized Search Tree)
-- For geometric data
CREATE INDEX idx_locations_coords ON locations USING gist(coordinates);
-- For range types
CREATE INDEX idx_events_duration ON events USING gist(duration);
- Best for: Geometric/spatial data, range types
- Use when: PostGIS, geometric queries
Partial Indexes
Index only rows that match a condition.
-- Full index: includes all 1,000,000 rows
CREATE INDEX idx_orders_status ON orders(status);
-- Partial index: only includes pending orders (maybe 1,000 rows)
CREATE INDEX idx_orders_pending ON orders(status)
WHERE status = 'pending';
-- Much smaller index, faster to search and maintain!
Use cases:
-- Only index active users
CREATE INDEX idx_users_active_email ON users(email) WHERE is_active = true;
-- Only index recent orders
CREATE INDEX idx_orders_recent ON orders(created_at)
WHERE created_at > '2024-01-01';
-- Only index unprocessed items
CREATE INDEX idx_jobs_unprocessed ON background_jobs(queue, priority)
WHERE processed_at IS NULL;
Covering Indexes (Index-Only Scans)
A covering index contains all columns needed by a query, so the database doesn’t need to access the table at all.
-- Query needs: email, name, created_at
SELECT email, name, created_at
FROM users
WHERE email = 'john@example.com';
-- Regular index (must access table for name, created_at)
CREATE INDEX idx_users_email ON users(email);
-- Index Scan → then fetch name, created_at from table
-- Covering index (includes all needed columns)
CREATE INDEX idx_users_email_covering ON users(email) INCLUDE (name, created_at);
-- Index Only Scan → no table access needed!
PostgreSQL syntax:
CREATE INDEX idx_name ON table(search_columns) INCLUDE (extra_columns);
Real-World Examples
Example 1: E-Commerce Orders
-- Common queries:
-- 1. Get user's orders
-- 2. Get orders by status
-- 3. Get recent orders
-- 4. Get user's orders by status
-- Recommended indexes:
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status) WHERE status IN ('pending', 'processing');
CREATE INDEX idx_orders_created ON orders(created_at DESC);
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
Example 2: Blog Posts
-- Common queries:
-- 1. Get posts by author
-- 2. Get published posts by date
-- 3. Search posts by title
-- 4. Get posts by category
-- Recommended indexes:
CREATE INDEX idx_posts_author ON posts(author_id);
CREATE INDEX idx_posts_published ON posts(published_at DESC) WHERE is_published = true;
CREATE INDEX idx_posts_title_search ON posts USING gin(to_tsvector('english', title));
CREATE INDEX idx_posts_category ON posts(category_id) WHERE is_published = true;
Example 3: User Sessions
-- Common queries:
-- 1. Get session by token
-- 2. Get user's active sessions
-- 3. Delete expired sessions
-- Recommended indexes:
CREATE UNIQUE INDEX idx_sessions_token ON sessions(token);
CREATE INDEX idx_sessions_user_active ON sessions(user_id) WHERE expires_at > NOW();
CREATE INDEX idx_sessions_expires ON sessions(expires_at) WHERE expires_at < NOW();
Index Maintenance
Check Index Usage
-- PostgreSQL: See which indexes are being used
SELECT
schemaname,
tablename,
indexname,
idx_scan, -- Times index was used
idx_tup_read, -- Rows read from index
idx_tup_fetch -- Rows fetched from table
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- Find unused indexes
SELECT indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE '%_pkey';
Check Index Size
-- PostgreSQL: Index sizes
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexname::regclass)) as size
FROM pg_indexes
WHERE tablename = 'users'
ORDER BY pg_relation_size(indexname::regclass) DESC;
Rebuild Bloated Indexes
-- PostgreSQL: Rebuild index
REINDEX INDEX idx_users_email;
-- Rebuild all indexes on a table
REINDEX TABLE users;
-- Concurrent rebuild (doesn't lock table)
REINDEX INDEX CONCURRENTLY idx_users_email;
Quick Reference
The Bottom Line
Database indexing is not optional for production systems. It’s the difference between “fast” and “unusable.”
Remember:
- Indexes make reads faster, writes slower — Balance is key
- Use EXPLAIN — Don’t guess, look at the query plan
- Index your WHERE clauses — Especially foreign keys
- Composite index order matters — Left-prefix rule
- Monitor index usage — Remove unused indexes
- Start simple — Add indexes based on actual slow queries
One well-placed index can do more for performance than weeks of code optimization. Learn to use them well.