A vector database is a type of database that indexes and stores vector embeddings for fast retrieval and similarity search, with capabilities like CRUD operations, metadata filtering, and horizontal scaling.
Information comes in many forms:
- Some information is unstructured—like text documents, rich media, and audio.
- Some information is structured—like application logs, tables, and graphs.
Innovations in artificial intelligence and machine learning (AI/ML):
- Have allowed us to create a type of ML model—embedding models.
- Embeddings encode all types of data into vectors that capture the meaning and context of an asset.
- This allows us to find similar assets by searching for neighboring data points.
Vector search methods:
- Allow unique experiences like taking a photograph with your smartphone and searching for similar images.
Vector databases provide:
- The ability to store and retrieve vectors as high-dimensional points.
- Additional capabilities for efficient and fast lookup of nearest-neighbors in the N-dimensional space.
- They are typically powered by k-nearest neighbor (k-NN) indexes.
- Built with algorithms like the Hierarchical Navigable Small World (HNSW) and Inverted File Index (IVF) algorithms.
- Additional capabilities like data management, fault tolerance, authentication and access control, and a query engine.
Comments