← All episodes
April 24, 2026·Cody Feda

Vector Databases

Load-Bearing

The AI Database

tl;dr

$ Everything AI has done to impress you is based on vector data. These vectors need to be stored in vector databases.

Vector Databases

Buzzword Rating: Load Bearing

Vector Databases are very important in the current AI landscape. Everything that ChatGPT or Claude Code has done to impress you is based on vectors.

I rate vector databases as Load Bearing because they are foundational to how AI works right now. We need to store vectors somewhere. This caused a wave of new vector database companies to come to the scene. Meanwhile, every existing database company has started marketing themselves as a vector database.

There are four concepts you need in order to understand what a vector database is and the importance of them:

  1. Vector
  2. Semantic meaning
  3. Embedding
  4. Database

Vector

A vector is a list of numbers. That's it.

[0.4, 0.1]             -- two dimensions
[0.2, 0.8, 0.4, 0.1]  -- four dimensions

Vectors used in modern LLMs run in the 1,536 to 4,096 dimension range. So more like:

[0.3, 0.9, 0.1, ...]

Semantic Meaning

Different types of chips

Semantic means the meaning and interpretation of words in context.

Take the word "chip." Put it next to different words and it becomes a completely different object:

  • Nacho chip
  • Fish and chips
  • Chip on your shoulder
  • Golf chip shot
  • Poker chip
  • NVIDIA chip

Same word. Completely different meaning. That's semantics.


Embedding

The embedding process takes a word and converts it into a vector that captures its semantic meaning. Once you have vectors, you can run math on them.

The classic example: king - man + woman = queen

A better one: chips + cheese = nachos

Vector math with food

Words with similar meanings end up near each other in vector space. Empanadas, calzones, samosas, bao, pasties, pierogies... they all cluster together. They're all variations on the same idea: dough wrapped around filling.

This is extremely powerful. If we generated embeddings for every word in the English language, we could simulate a high degree of inteligence. But when we've embedded everything ever posted to Reddit, we can start to see much deeper hidden meanings that tie the entire world together.


Vector Databases

Now we have trillions of vectors and more being generated every second. They represent Reddit comments, product descriptions, support tickets, legal documents, quite literally everything you've ever heard of. You need somewhere to put them.

That's the vector database.

The Bolt-On Approach

Existing databases only need two things to handle vectors:

  1. A vector data type
  2. An index optimized for vector search

Here's what a standard table looks like:

id (integer)food (string)
1chips
2cheese
3nachos
4empanada
5pupusa
6samosa
7bao
8pot pie

Here's what a bolt on looks like:

id (int)food (string)vector (vector)
1chips[0.57, 2.10, -0.83, 1.44, 0.06, -1.72, 0.39, ...]
2cheese[-0.34, 1.87, 0.72, -1.19, 2.41, -0.08, 1.03, ...]
3nachos[2.06, -1.53, -0.27, 0.88, -0.61, 1.95, 0.44, ...]
4empanada[-1.78, 0.33, 1.60, -0.94, 0.17, 2.29, -0.70, ...]
5calzone[-1.81, 0.29, 1.55, -0.98, 0.21, 2.31, -0.67, ...]
6samosa[-1.74, 0.37, 1.63, -0.89, 0.14, 2.25, -0.73, ...]
7bao[-1.83, 0.31, 1.58, -0.97, 0.20, 2.34, -0.65, ...]
8pot pie[-1.76, 0.35, 1.57, -0.91, 0.19, 2.27, -0.72, ...]

The databases that have gone this route:

DatabaseTheir branding
PostgreSQLpgvector
MySQLVECTOR
RedisVector Search
OracleVector
MongoDBAtlas Vector Search

Vector-First Databases

Then there are databases built for vectors from the start:

DatabaseModel
PineconeFully managed
MilvusOpen source + cloud
WeaviateOpen source + cloud
ChromaDBLightweight / local

The Main Query: Nearest Neighbor Search

Dough-wrapped foods

The primary thing you do with a vector database is ask "what's closest to this?"

results = index.query(
    vector=empanada.vector,
    top_k=3
)

top_k means: only give me the k most similar results.

Query for empanada, get back:

  1. Calzone
  2. Samosa
  3. Bao

This is called a nearest neighbor search, and it's how semantic search works under the hood.

Search Types

  • Semantic search find things with similar meaning, not just matching words
  • Lexical search traditional keyword matching
  • Hybrid both, blended together

The Indexing Problem

Indexing adds efficiency to database queries. With standard text/numeric fields, there have been many fiscal years, life-persuits, dissertations dedicated to improving the quality and speed of the queries.

Efficiently Indexing in a multi-dimensional vector space makes all that work above seem easier tha baking a frozen pizza.

The good news is that indexing is already quite efficient, and from my and your perspective, very easy. Using vectors in your product today is cheap compared to what it would have cost even a year ago.


So Which One Do You Use?

The vector-first databases offer incredible scaling capabilities. The bolt-on options are plenty capable if you're just looking to get off the ground. If you have no idea where to start, start with PGVector. It's open source, well documented, and well resourced!


More buzzwords