Chidi - Portfolio

PostgreSQL is a powerful relational database, but getting peak performance requires understanding indexing, query optimization, and configuration tuning. This comprehensive guide covers practical techniques for optimizing your database.

Query Analysis Tools

EXPLAIN and EXPLAIN ANALYZE

-- Show query plan
EXPLAIN SELECT * FROM users WHERE age > 30;

-- Show actual execution stats
EXPLAIN ANALYZE SELECT * FROM users WHERE age > 30;

Understanding Output

Seq Scan on users (cost=0.00..25.00 rows=500 width=100)
  Filter: (age > 30)
  Rows: 250

cost: Estimated cost (disk I/O and CPU)
rows: Estimated/actual number of rows
width: Average row width in bytes

Indexing Strategies

Single Column Index

-- Basic index
CREATE INDEX idx_users_age ON users(age);

-- For range queries
CREATE INDEX idx_users_created ON users(created_at DESC);

Composite Index

-- Index on multiple columns for common queries
CREATE INDEX idx_users_status_created 
ON users(status, created_at DESC);

-- Efficient for: WHERE status = 'active' ORDER BY created_at DESC

Partial Index

-- Index only active users (saves space)
CREATE INDEX idx_active_users ON users(id) 
WHERE status = 'active';

-- Reduces index size significantly

BRIN Index

-- Better for very large tables
CREATE INDEX idx_events_timestamp 
ON events USING BRIN (timestamp);

-- Uses 100x less space than B-tree for time-series data

Query Optimization

Problem: N+1 Queries

-- Bad: Multiple queries
SELECT * FROM users;
-- Then for each user: SELECT * FROM posts WHERE user_id = ?

-- Good: Single query with JOIN
SELECT u.*, p.* 
FROM users u
LEFT JOIN posts p ON u.id = p.user_id;

Problem: Unnecessary Columns

-- Bad: Selecting everything
SELECT * FROM large_table WHERE id = 1;

-- Good: Select only needed columns
SELECT id, name, email FROM large_table WHERE id = 1;

Problem: Missing WHERE Clause

-- Bad: Full table scan
SELECT COUNT(*) FROM orders;

-- Good: With index
SELECT COUNT(*) FROM orders WHERE created_at > NOW() - INTERVAL '30 days';

Aggregation Optimization

Using GROUP BY Efficiently

-- Bad: Unnecessary DISTINCT
SELECT DISTINCT user_id FROM orders;

-- Good: GROUP BY
SELECT user_id FROM orders GROUP BY user_id;

Materialized Views

-- Create a pre-computed view for expensive queries
CREATE MATERIALIZED VIEW daily_revenue AS
SELECT DATE(created_at), SUM(amount)
FROM orders
GROUP BY DATE(created_at);

-- Refresh periodically
REFRESH MATERIALIZED VIEW daily_revenue;

Table Design

Normalization

-- Bad: Denormalized (data redundancy)
CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_name VARCHAR(100),
  customer_email VARCHAR(100),
  customer_phone VARCHAR(20),
  product_name VARCHAR(100),
  product_price DECIMAL(10, 2)
);

-- Good: Normalized (separate tables)
CREATE TABLE customers (
  id INT PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(100),
  phone VARCHAR(20)
);

CREATE TABLE products (
  id INT PRIMARY KEY,
  name VARCHAR(100),
  price DECIMAL(10, 2)
);

CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT REFERENCES customers(id),
  product_id INT REFERENCES products(id),
  quantity INT
);

Partitioning Large Tables

-- Range partitioning by date
CREATE TABLE events (
  id BIGSERIAL,
  user_id INT,
  event_type VARCHAR(50),
  created_at TIMESTAMP
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_01 PARTITION OF events
  FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE events_2026_02 PARTITION OF events
  FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

Configuration Tuning

Memory Settings

# postgresql.conf

# RAM available / number of connections
shared_buffers = 4GB           # 25% of system RAM

# Working memory per operation
work_mem = 16MB

# Maintenance operations
maintenance_work_mem = 1GB

# Cache for OS disk blocks
effective_cache_size = 12GB    # 50-75% of system RAM

Connection Pool

# Maximum concurrent connections
max_connections = 100

# Reserve connections for superuser
superuser_reserved_connections = 3

Query Planner

# Parallel query execution
max_parallel_workers_per_gather = 4
max_parallel_workers = 8

# Random page cost (lower for SSD)
random_page_cost = 1.1  # Default is 4.0 for HDD

Monitoring

Check Index Usage

-- Find unused indexes
SELECT schemaname, tablename, indexname
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY idx_blks_read DESC;

Monitor Query Performance

-- Slow query log
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slowest queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

Table Statistics

-- Check table size
SELECT 
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Vacuum and Analyze

Regular Maintenance

-- Update table statistics
ANALYZE table_name;

-- Clean up dead rows
VACUUM table_name;

-- Full vacuum (locks table)
VACUUM FULL table_name;

-- Combined
VACUUM ANALYZE table_name;

Auto-vacuum Configuration

# postgresql.conf
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 10s

Replication & Backups

Streaming Replication

-- On primary server
CREATE PUBLICATION primary_pub FOR ALL TABLES;

-- On replica
CREATE SUBSCRIPTION primary_sub
  CONNECTION 'postgresql://user@primary_host/db'
  PUBLICATION primary_pub;

Backup Strategy

#!/bin/bash
# Daily backup
pg_dump -Fc -U postgres dbname > /backups/db_$(date +%Y%m%d).dump

# Restore
pg_restore -d dbname /backups/db_20260110.dump

Common Performance Issues

Issue	Symptom	Solution
Missing indexes	High seq scans	Add indexes on WHERE/JOIN columns
Too many indexes	Slow writes	Drop unused indexes
Stale stats	Bad query plans	Run ANALYZE
Memory pressure	High swap usage	Increase shared_buffers
Bloated tables	Slow full scans	Run VACUUM FULL

Conclusion

PostgreSQL performance optimization is an ongoing process. Regular monitoring, strategic indexing, and thoughtful configuration tuning will keep your database running smoothly.

Key Takeaways:

Use EXPLAIN ANALYZE before and after changes
Index columns used in WHERE, JOIN, and ORDER BY
Monitor slow queries regularly
Keep statistics up to date with ANALYZE
Partition large tables appropriately
Reserve at least 25% of RAM for shared_buffers

Start optimizing today

PostgreSQL Performance Tuning