VectorDB学习资料汇总 - 高性能矢量数据库管理系统

Ray

vectordb

VectorDB简介

VectorDB是一个开源的矢量数据库,主要特点包括:

高性能和可生产规模的相似度搜索
完整的数据库管理系统,支持数据库、表、字段等熟悉的概念
元数据过滤
密集和稀疏向量融合的混合搜索
内置嵌入支持,实现自然语言输入和输出的搜索体验
云原生架构,支持计算和存储分离、无服务器、多租户等特性
丰富的生态系统集成,包括LangChain和LlamaIndex
提供Python/JavaScript/Ruby客户端和REST API接口

VectorDB的核心用C++编写,利用先进的学术并行图遍历技术进行向量索引,实现了比HNSW快10倍的矢量搜索,同时保持超过99.9%的精度水平。

快速开始

使用Docker可以快速启动VectorDB:

运行后端Docker容器:

docker pull epsilla/vectordb
docker run --pull=always -d -p 8888:8888 -v /data:/data epsilla/vectordb

使用Python客户端交互:

pip install pyepsilla

from pyepsilla import vectordb

client = vectordb.Client(host='localhost', port='8888')
client.load_db(db_name="MyDB", db_path="/data/epsilla")
client.use_db(db_name="MyDB")

client.create_table(
    table_name="MyTable",
    table_fields=[
        {"name": "ID", "dataType": "INT", "primaryKey": True},
        {"name": "Doc", "dataType": "STRING"},
    ],
    indices=[
      {"name": "Index", "field": "Doc"},
    ]
)

client.insert(
    table_name="MyTable",
    records=[
        {"ID": 1, "Doc": "Jupiter is the largest planet in our solar system."},
        {"ID": 2, "Doc": "Cheetahs are the fastest land animals, reaching speeds over 60 mph."},
        {"ID": 3, "Doc": "Vincent van Gogh painted the famous work \"Starry Night.\""},
        {"ID": 4, "Doc": "The Amazon River is the longest river in the world."},
        {"ID": 5, "Doc": "The Moon completes one orbit around Earth every 27 days."},
    ],
)

client.query(
    table_name="MyTable",
    query_text="Celestial bodies and their characteristics",
    limit=2
)

学习资源

官方文档 - 详细介绍VectorDB的使用方法和API
GitHub仓库 - 源代码和最新更新
博客 - 技术文章和使用案例分享
YouTube频道 - 视频教程和演示
Discord社区 - 与其他用户和开发者交流
Twitter - 关注最新动态
Epsilla Cloud - 体验托管版VectorDB服务

通过以上资源,读者可以全面了解VectorDB的功能和使用方法,快速将其应用到实际项目中。VectorDB作为一个高性能的矢量数据库解决方案,值得关注和学习。