Database Indexing: The 80/20 Guide
Database indexes are one of those topics that seems simple until you need them, then suddenly feels impossibly complicated. The good news is you don't need to understand B-trees or hash algorithms to make good indexing decisions. You just need to understand a few core concepts.
This guide covers the 20% of indexing knowledge that handles 80% of real-world situations. It's not everything, but it's enough to stop your queries from grinding to a halt.
What an Index Actually Does
Think of an index like the index at the back of a textbook. Instead of reading every page to find mentions of "photosynthesis," you look it up in the index and jump directly to the relevant pages.
Without an index, your database has to scan every single row to find what you're looking for. That's fine when you have 100 rows. When you have 10 million rows, it's a disaster.
An index creates a separate data structure that lets the database find rows quickly based on specific columns. The tradeoff is that indexes take up space and slow down writes, since every INSERT and UPDATE also needs to update the index.
When You Need an Index
The simplest rule: add indexes on columns you frequently query against. This means:
- Columns in WHERE clauses
- Columns used in JOIN conditions
- Columns used in ORDER BY
- Foreign key columns
If you're constantly querying SELECT * FROM users WHERE email = ?, you need an index on the email column. Without it, every lookup scans the entire users table.
When You Don't Need an Index
Not every column needs an index. Avoid indexing:
- Columns with very few unique values (like boolean flags)
- Columns you rarely query against
- Tables that are small enough that scans are fast anyway
- Columns that change constantly
A boolean "is_active" column with only two possible values won't benefit much from an index. The database will still need to scan a huge portion of the table either way.
Composite Indexes: Order Matters
When you query on multiple columns, you need composite indexes. But the column order in a composite index matters a lot.
If you create an index on (country, city), the database can use it for:
- Queries filtering by country only
- Queries filtering by country AND city
But it can't use it for queries filtering by city alone. The index is organized by country first, so finding all rows with a specific city requires scanning the whole index anyway.
The general rule: put the most selective column first, and put columns used in equality conditions before those used in ranges.
The EXPLAIN Command
Don't guess whether your index is working. Check it. Every major database has an EXPLAIN command that shows you exactly how a query will execute.
In PostgreSQL: EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com'
This shows you whether the database is using your index or doing a sequential scan. It also shows the actual execution time and row counts, so you can see the real impact of your indexing decisions.
Make EXPLAIN your first tool when debugging slow queries. It tells you exactly what's happening instead of making you guess.
Common Indexing Mistakes
Over-indexing. Every index slows down writes and takes up disk space. Don't add indexes "just in case." Add them when you have a performance problem or know you'll query that column frequently.
Ignoring query patterns. Index based on how you actually query the data, not how the schema looks. Your most important queries should drive your indexing strategy.
Forgetting about updates. If a column changes frequently and has an index, every update has extra overhead. Sometimes it's better to skip the index on volatile columns.
Not using covering indexes. A covering index includes all the columns a query needs, so the database can answer the query from the index alone without touching the table. This can massively speed up read-heavy queries.
Partial Indexes
Sometimes you only care about a subset of rows. If you're always querying active users, a partial index on WHERE is_active = true is smaller and faster than indexing all users.
In PostgreSQL: CREATE INDEX idx_active_users ON users(email) WHERE is_active = true
This index is smaller because it only includes active users, and queries against active users will be faster because there's less data to search.
Monitoring Index Usage
Most databases track which indexes are actually being used. In PostgreSQL, check pg_stat_user_indexes to see scan counts for each index. If an index has zero scans, it's just wasting space and slowing down writes.
Periodically audit your indexes. Remove ones that aren't being used. Add ones for slow queries you discover in production.
Start Simple
Don't over-engineer your indexing strategy upfront. Start with the obvious indexes: primary keys, foreign keys, and columns you know you'll query frequently. Then monitor your slow queries in production and add indexes as needed.
The best indexing strategy is one that evolves with your actual usage patterns. Let real performance data guide your decisions instead of trying to predict every possible query upfront.