PostgreSQL Is the Only Database You Need

Most solo builder stacks have four or five databases pretending to be necessary. PostgreSQL for relational data, Redis for caching and queues, Elasticsearch for search, Pinecone for embeddings, maybe RabbitMQ for pub/sub. That's five processes to monitor, five sets of credentials, five things that can fail at 2 AM.

PostgreSQL does all of it. Not with janky workarounds or half-baked extensions. With production-grade features that have been shipping for years.

The Database That Keeps Absorbing Its Neighbors

PostgreSQL has been quietly eating the features of every specialized database around it. JSONB landed in version 9.4, back in 2014. Full-text search has been built in since version 8.3. The LISTEN/NOTIFY pub/sub system has existed since version 6.4, which shipped in 1998. And pgvector brought native vector similarity search in 2023.

Each of these features, on its own, is a solid replacement for a standalone service. Combined, they turn PostgreSQL into a platform that handles document storage, search, message passing, and AI embeddings alongside your relational data. Same process. Same backups. Same connection string.

JSONB Replaces Your Document Store

The most common reason people add MongoDB or DynamoDB alongside Postgres is schema flexibility. They want to store data that doesn't fit neatly into rows and columns. Product metadata, user preferences, event payloads, webhook bodies.

JSONB handles all of it. And unlike a text column storing JSON strings, JSONB is a binary format that Postgres can index and query natively.

CREATE TABLE events (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  event_type TEXT NOT NULL,
  payload JSONB NOT NULL DEFAULT '{}',
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Insert a structured event with nested data
INSERT INTO events (event_type, payload) VALUES (
  'signup',
  '{"user": "ember", "plan": "pro", "referrer": {"source": "blog", "campaign": "spring"}}'
);

-- Query nested fields directly
SELECT payload->'referrer'->>'source' AS source,
       count(*)
FROM events
WHERE event_type = 'signup'
  AND payload->'referrer'->>'source' IS NOT NULL
GROUP BY 1;

The -> operator traverses JSON objects, ->> extracts as text. You can filter, aggregate, and join on JSONB fields the same way you would on regular columns. Add a GIN index and those queries stay fast at scale:

CREATE INDEX idx_events_payload ON events USING gin (payload);

That single index makes any @> containment query fast. Need all signups from the blog referrer?

SELECT * FROM events
WHERE payload @> '{"referrer": {"source": "blog"}}';

The GIN index handles it. No table scan.

I store webhook payloads, API responses, and configuration blobs in JSONB columns. The schema stays clean — a few typed columns for the things I query constantly, and a JSONB column for everything else. When the shape of the data changes (and it always changes), I don't run a migration. The new fields just show up in the JSON.

Full-Text Search Without Elasticsearch

Elasticsearch is a powerful piece of software that you almost certainly don't need. It's designed for petabyte-scale search across distributed clusters. If you're a solo builder searching thousands or even millions of rows, PostgreSQL's built-in search is more than enough.

The core of it is tsvector (the indexed document) and tsquery (the search input):

-- Add a search column to an existing table
ALTER TABLE articles ADD COLUMN search_tsv tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
  ) STORED;

CREATE INDEX idx_articles_search ON articles USING gin (search_tsv);

That generated column automatically updates whenever title or body changes. The setweight calls make title matches rank higher than body matches. The GIN index makes the search fast.

Querying is straightforward:

SELECT title,
       ts_rank(search_tsv, websearch_to_tsquery('english', 'postgresql jsonb')) AS rank
FROM articles
WHERE search_tsv @@ websearch_to_tsquery('english', 'postgresql jsonb')
ORDER BY rank DESC
LIMIT 20;

websearch_to_tsquery accepts the kind of search strings your users actually type — quoted phrases, minus signs for exclusion, implicit AND between terms. No special syntax needed.

For fuzzy matching (typo tolerance), add the pg_trgm extension:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX idx_articles_title_trgm ON articles USING gin (title gin_trgm_ops);

-- Find articles even with misspellings
SELECT title, similarity(title, 'postgressql') AS sim
FROM articles
WHERE title % 'postgressql'
ORDER BY sim DESC;

The % operator matches strings within a configurable similarity threshold. Combined with tsvector, you get ranked full-text search with fuzzy matching. That covers ~95% of what solo builders use Elasticsearch for, without running a separate JVM process that idles at 500MB of RAM.

pgvector Keeps Embeddings Next to Your Data

The vector database market wants you to believe that embeddings require a specialized system. Pinecone, Weaviate, Qdrant, Milvus — all pitched as essential infrastructure for anything involving AI.

For a solo builder, pgvector does the job and eliminates an entire category of infrastructure.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  content TEXT NOT NULL,
  embedding vector(1024),
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Create an HNSW index for fast approximate nearest neighbor search
CREATE INDEX idx_documents_embedding ON documents
  USING hnsw (embedding vector_cosine_ops);

The vector(1024) type stores a 1024-dimensional embedding. HNSW (Hierarchical Navigable Small World) indexing gives you approximate nearest neighbor search that's fast enough for real-time queries up to a few million vectors.

Semantic search across your documents becomes a single query:

SELECT content,
       1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE 1 - (embedding <=> $1::vector) > 0.55
ORDER BY embedding <=> $1::vector
LIMIT 10;

The <=> operator computes cosine distance. The HNSW index keeps this fast without scanning every row.

The real advantage isn't performance. It's locality. Your embeddings live in the same database as your relational data. You can join them. A single query can find semantically similar documents, filter by category, check permissions, and return the result — no cross-service calls, no sync pipelines, no eventual consistency headaches.

-- Semantic search with relational filters in one query
SELECT d.content,
       1 - (d.embedding <=> $1::vector) AS similarity,
       c.name AS category
FROM documents d
JOIN categories c ON c.id = d.category_id
WHERE d.archived_at IS NULL
  AND c.name = 'technical'
  AND 1 - (d.embedding <=> $1::vector) > 0.55
ORDER BY d.embedding <=> $1::vector
LIMIT 10;

Try doing that across Pinecone and Postgres. You'd need two queries, application-level joining, and a prayer that the IDs stay in sync.

LISTEN/NOTIFY for Lightweight Pub/Sub

Redis pub/sub and RabbitMQ exist for a reason. If you need guaranteed delivery, persistent queues, or message routing across distributed systems, use them.

For a solo builder who needs to notify a worker process when a new row appears, PostgreSQL's LISTEN/NOTIFY is enough:

-- In your worker process
LISTEN new_task;

-- In your application, after inserting a task
INSERT INTO tasks (title, status) VALUES ('Process upload', 'pending');
NOTIFY new_task, 'task_id:42';

The worker receives the notification instantly over its existing database connection. No broker. No additional port. No serialization format to agree on.

For simple job queues, combine NOTIFY with FOR UPDATE SKIP LOCKED:

-- Worker claims the next available task atomically
UPDATE tasks SET status = 'processing', claimed_at = now()
WHERE id = (
  SELECT id FROM tasks
  WHERE status = 'pending'
  ORDER BY created_at
  FOR UPDATE SKIP LOCKED
  LIMIT 1
)
RETURNING *;

SKIP LOCKED means multiple workers can pull from the same queue without conflicts. Each worker gets a different task. No duplicate processing. No distributed locking service. Postgres handles the concurrency.

The Gotchas

PostgreSQL is not the best at everything it does. It's good enough at everything, which is a different and more useful property for solo builders.

Full-text search works well up to a few million rows with proper indexing. Beyond that, or if you need features like faceted search, autocomplete suggestions, or custom analyzers, Elasticsearch starts to earn its complexity. I've run tsvector search across ~13,000 rows with sub-20ms response times. At 10 million rows with complex ranking, you'd want to benchmark.

pgvector's HNSW index is fast for collections up to a few million vectors. Pinecone and purpose-built vector databases optimize for billions. If your embedding collection is measured in thousands or low millions, pgvector is the right choice. If you're building a search engine over the entire internet, it's not.

JSONB queries are fast with GIN indexes, but they'll never match a dedicated document database for write-heavy workloads with complex nested updates. If you're doing thousands of partial JSON updates per second on deeply nested structures, that's a real use case for MongoDB. Most solo builders aren't doing that.

LISTEN/NOTIFY is fire-and-forget. If your worker is disconnected when the notification fires, it's gone. No replay, no dead letter queue. For critical job processing, you'd want the SKIP LOCKED polling pattern as the primary mechanism, with NOTIFY as a latency optimization on top.

The Operational Case

The technical arguments matter less than the operational ones. Every additional service in your stack is a process to monitor, a version to upgrade, a config to secure, a backup to run, and a failure mode to debug at 2 AM.

Here's what a typical "polyglot persistence" stack looks like for a solo builder versus consolidating on PostgreSQL:

Concern Multi-Database Stack PostgreSQL Only
Processes to monitor 4-5 1
Backup strategy Per-service (different tools) pg_dump
Connection management Multiple drivers, multiple pools One connection string
Upgrade path Staggered, version matrix One major version per year
Failure blast radius Cascading cross-service failures One process, one failure mode
Data consistency Eventual (sync pipelines) Transactional
Disk footprint ~2-4GB idle ~200-400MB

One backup command captures everything — your relational data, your JSON documents, your search indexes, your embeddings. One pg_dump and you can restore your entire application state on a fresh machine in minutes.

That simplicity compounds. Every hour you don't spend debugging Redis connection drops or Elasticsearch cluster health is an hour spent on the product your customers actually pay for.

The Setup

Getting all of this running takes about five minutes:

brew install postgresql@16
brew services start postgresql@16

psql postgres -c "CREATE DATABASE myapp;"
psql myapp -c "CREATE EXTENSION IF NOT EXISTS vector;"
psql myapp -c "CREATE EXTENSION IF NOT EXISTS pg_trgm;"

Three extensions. One database. You now have relational storage, document storage, full-text search, fuzzy matching, vector similarity search, and pub/sub. Total idle memory: ~30-50MB.

Compare that to spinning up Redis (docker run redis), Elasticsearch (docker run elasticsearch plus the JVM tuning dance), and Pinecone (SaaS signup, API keys, network calls). Each one individually is easy. Managing all of them together is where the complexity lives, and it's the kind of complexity that doesn't show up until something breaks.

When to Graduate

PostgreSQL stops being enough when your scale demands it, and you'll know when that happens because you'll have the metrics to prove it. Query latency climbing past your SLA on a properly indexed table. Write throughput hitting the ceiling of a single node. Search result quality that needs custom ML ranking models.

These are good problems. They mean your product is working. And by the time you hit them, you'll have the revenue and the context to choose the right specialized tool — not because a blog post told you to, but because your own data shows you need it.

Until then, run one database. Back up one thing. Monitor one process. Spend your time on the product.

The best infrastructure is the infrastructure you forget is there. PostgreSQL has been that for me. One connection string, one backup script, one process quietly handling six jobs that used to be six services.