init seasearch docs

2024-09-05 13:59:42 +08:00 · 2024-09-05 13:59:42 +08:00 · 1088265817
commit 1088265817
15 changed files with 1378 additions and 0 deletions
--- a/.github/workflows/deploy.yml
+++ b/.github/workflows/deploy.yml
@ -0,0 +1,18 @@
+name: Deploy CI
+
+on:
+  push:
+    branches:
+      - master
+
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: actions/setup-python@v2
+        with:
+          python-version: 3.x
+      - run: pip install mkdocs-material mkdocs-awesome-pages-plugin mkdocs-material-extensions
+      - run: cd $GITHUB_WORKSPACE
+      - run: mkdocs gh-deploy --force
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,2 @@
+*~
+/.idea
--- a/LICENSE.txt
+++ b/LICENSE.txt
@ -0,0 +1,13 @@
+Copyright (c) 2016 Seafile Ltd.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
--- a/README.md
+++ b/README.md
@ -0,0 +1,15 @@
+# SeaSearch Docs
+
+Manual for SeaSearch
+
+The web site: https://haiwen.github.io/seasearch-docs/
+
+## Serve docs locally
+
+These docs are built using 'mkdocs'.  Install the tooling by running:
+
+```
+pip3 install mkdocs-material mkdocs-awesome-pages-plugin mkdocs-material-extensions
+```
+
+Start up the development server by running `mkdocs serve` in the project root directory.  Browse at `http://127.0.0.1:8000/seasearch-docs/`.
--- a/manual/CNAME
+++ b/manual/CNAME
@ -0,0 +1 @@
+manual.seafile.com
--- a/manual/README.md
+++ b/manual/README.md
@ -0,0 +1,3 @@
+# Introduction
+
+ZincSearch 是一个 Go 语言实现的全文检索服务器，提供了兼容 ElasticSearch DSL 的 API。它采用了 Bluge 作为索引引擎。Bluge 是一个广泛使用的 Go 语言全文索引库  Bleve（由 CouchBase 公司开发）的 fork 版本，对代码进行重构改造，使得它更加现代化和灵活。
--- a/manual/api/seasearch_api.md
+++ b/manual/api/seasearch_api.md
@ -0,0 +1,684 @@
+
+
+# API 介绍
+
+
+
+SeaSearch 通过 Http Basic Auth 进行权限校验，API 请求需要在 header 中携带对应的 token。
+
+
+
+生成 basic auth 可以通过这个工具: [http://web.chacuo.net/safebasicauth](http://web.chacuo.net/safebasicauth)
+
+
+
+## 用户管理
+
+
+
+### 管理员用户
+
+
+
+SeaSearch 通过账户来管理API权限等，程序在第一次启动时，需要通过环境变量配置一个管理员帐号
+
+
+
+以下是 管理员帐号示例：
+
+```plaintext
+set ZINC_FIRST_ADMIN_USER=admin
+set ZINC_FIRST_ADMIN_PASSWORD=Complexpass#123
+```
+
+
+### 普通用户
+
+
+
+可以通过API来创建/更新用户：
+
+```plaintext
+[POST] /api/user
+
+{ 
+    "_id": "prabhat",
+    "name": "Prabhat Sharma",
+    "role": "admin", // or user
+    "password": "Complexpass#123"
+}
+```
+
+
+获取所有用户：
+
+```plaintext
+[GET] /api/user
+```
+
+
+删除用户：
+
+```plaintext
+[DELETE] /api/user/${userId}
+```
+
+
+
+## 索引相关
+
+
+
+### 创建索引
+
+
+
+创建一个 SeaSearch 索引，并且在此时可以同时设置 mappings 以及 settings。
+
+
+
+我们也可以直接通过其他请求设置 settings 或者 mapping，如果 index不存在，则会自动创建。
+
+
+
+SeaSearch 文档：[https://zincsearch-docs.zinc.dev/api/index/create/#update-a-exists-index](https://zincsearch-docs.zinc.dev/api/index/create/#update-a-exists-index)
+
+
+
+参考 ES api文档：[https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html)
+
+
+
+### 配置 mappings
+
+
+
+mappings 定义了 document 中，字段的规则，例如类型，格式等。
+
+
+
+可以通过单独的 API 来配置 mapping:
+
+
+
+SeaSearch api: [https://zincsearch-docs.zinc.dev/api-es-compatible/index/update-mapping/](https://zincsearch-docs.zinc.dev/api-es-compatible/index/update-mapping/)
+
+
+
+ES 相关说明：[https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html)
+
+
+
+### 配置 settings
+
+
+
+settings 设置了 index 的 analyzer 分片等相关设置。
+
+
+
+SeaSearch api: [https://zincsearch-docs.zinc.dev/api-es-compatible/index/update-settings/](https://zincsearch-docs.zinc.dev/api-es-compatible/index/update-settings/)
+
+
+
+ES 相关说明：
+
+
+
+  * analyzer 相关概念：[https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-concepts.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-concepts.html)
+
+  * 如何指定 analyzer：[https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html)
+
+
+
+
+
+
+### analyzer 支持
+
+
+
+analzyer 可以在创建索引索引时配置 default ，也可以针对某个字段进行设置。（参考上一节中 settings ES 的文档了解相关概念。）
+
+
+
+SeaSearch 支持的 analyzer可以在这个页面中找到：[https://zincsearch-docs.zinc.dev/api/index/analyze/](https://zincsearch-docs.zinc.dev/api/index/analyze/) 里面的 tokenize, token filter 等概念和 ES 是一致的，且支持 ES 大部分常用的 analyzer 和 tokenizer 等。
+
+
+
+支持的常规analyzer
+
+
+
+  * standard 默认的 analyzer，如果没有指定，则采用此 analyzer，按词切分，小写处理
+
+  * simple 按照非字母切分（符号被过滤），小写处理
+
+  * keyword 不分词，直接将输入当作输出 
+
+  * stop 小写处理，停用词过滤器 (the、a、is等）
+
+  * web 由 buluge 实现，匹配 邮箱、url 等。处理小写，使用停用词过滤器
+
+  * regexp/pattern 正则表达式，默认\W+(非字符分割)，支持设置 小写、停用词
+
+  * whitespace 按照空格切分，不转小写
+
+
+
+
+
+
+多语言 analzyer：
+
+语言| analyzer  
+---|---  
+阿拉伯语| ar  
+丹麦语| da  
+德语| de  
+英语| english  
+西班牙语| es  
+波斯语| fa  
+亚洲地区国家| cjk  
+芬兰语| fi  
+法语| fr  
+印地语| hi  
+匈牙利语| hu  
+意大利语| it  
+荷兰语| nl  
+挪威语| no  
+葡萄牙语| pt  
+罗马尼亚语| ro  
+俄语| ru  
+瑞典语| sv  
+土耳其语| tr  
+索拉尼| ckb
+
+  
+  
+中文 analzyer：
+
+
+
+  * gse_standard 使用最短路径算法来分词
+
+  * gse_search 搜索引擎的分词模式，提供尽可能多的关键词
+
+
+
+
+
+
+中文 analyzer 使用的是 [gse](https://github.com/go-ego/gse) 这个库实现分词，是 python 结巴库的 Golang 实现，默认是没有启用的，需要通过环境变量来启用
+
+```plaintext
+ZINC_PLUGIN_GSE_ENABLE=true
+# true 启用中文分词支持,默认false
+
+ZINC_PLUGIN_GSE_DICT_EMBED=BIG 
+# BIG：使用gse内置词库与停用词；否则，使用 SeaSearch 内置的简单词库，默认 small
+
+ZINC_PLUGIN_GSE_ENABLE_STOP=true
+# true 使用停用词，默认 true
+
+ZINC_PLUGIN_GSE_ENABLE_HMM=true
+# 使用 HMM 模式用于搜素分词，默认为 true
+
+ZINC_PLUGIN_GSE_DICT_PATH=./plugins/gse/dict
+# 使用用户自定义词库与停用词，需要将内容放在配置的这个路径下，并且词库命名为 user.txt
+停用词命名为 stop.txt
+```
+
+
+## 全文检索
+
+
+
+### document CRUD
+
+
+
+创建 document：
+
+
+
+SeaSearch ：[https://zincsearch-docs.zinc.dev/api-es-compatible/document/create/](https://zincsearch-docs.zinc.dev/api-es-compatible/document/create/)
+
+
+
+ES api 说明：[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html)
+
+
+
+更新 document ：
+
+
+
+SeaSearch：[https://zincsearch-docs.zinc.dev/api-es-compatible/document/update/](https://zincsearch-docs.zinc.dev/api-es-compatible/document/update/)
+
+
+
+ES api 说明：[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html)
+
+
+
+删除 document：
+
+
+
+SeaSearch： [https://zincsearch-docs.zinc.dev/api-es-compatible/document/delete/](https://zincsearch-docs.zinc.dev/api-es-compatible/document/delete/)
+
+
+
+ES api 说明：[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html)
+
+
+
+根据 id获取 document：
+
+```plaintext
+[GET] /api/${indexName}/_doc/${docId}
+```
+
+
+### 批量进行操作
+
+
+
+应该尽量使用批量操作更新索引
+
+
+
+SeaSearch文档： [https://zincsearch-docs.zinc.dev/api-es-compatible/document/bulk/#request](https://zincsearch-docs.zinc.dev/api-es-compatible/document/bulk/#request)
+
+
+
+ES api说明：[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)
+
+
+
+### 搜索
+
+
+
+api示例：
+
+
+
+[https://zincsearch-docs.zinc.dev/api-es-compatible/search/search/](https://zincsearch-docs.zinc.dev/api-es-compatible/search/search/)
+
+
+
+全文搜索使用 DSL，使用方法可以参考：
+
+
+
+[https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html)
+
+
+
+delete-by-query：根据 query进行删除：
+
+```plaintext
+[POST] /es/${indexName}/_delete_by_query
+
+{
+  "query": {
+    "match": {
+      "name": "jack"
+    }
+  }
+}
+```
+
+
+ES api 文档：[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html)
+
+
+
+multi-search，支持对不同 index 执行不同的 query：
+
+
+
+SeaSearch 文档：[https://zincsearch-docs.zinc.dev/api-es-compatible/search/msearch/](https://zincsearch-docs.zinc.dev/api-es-compatible/search/msearch/)
+
+
+
+ES api 文档：[https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html)
+
+
+
+我们对 multi-search 做了扩展，使它支持在搜索不同的索引时，使用相同的统计信息，以使得得分计算更加精确，在请求中设置 query：unify_score=true 即可开启。
+
+```plaintext
+[POST] /es/
+
+{"index": "t1"}
+{"query": {"bool": {"should": [{"match": {"filename": {"query": "数据库", "minimum_should_match": "-25%"}}}, {"match": {"filename.ngram": {"query": "数据库", "minimum_should_match": "80%"}}}], "minimum_should_match": 1}}, "from": 0, "size": 10, "_source": ["path", "repo_id", "filename", "is_dir"], "sort": ["_score"]}
+{"index": "t2"}
+{"query": {"bool": {"should": [{"match": {"filename": {"query": "数据库", "minimum_should_match": "-25%"}}}, {"match": {"filename.ngram": {"query": "数据库", "minimum_should_match": "80%"}}}], "minimum_should_match": 1}}, "from": 0, "size": 10, "_source": ["path", "repo_id", "filename", "is_dir"], "sort": ["_score"]}
+```
+
+
+## 向量检索
+
+
+
+我们为 SeaSearch 扩展开发了向量检索的功能，以下是相关API介绍。
+
+
+
+### 创建向量索引
+
+
+
+使用向量检索功能，需要提前创建向量索引，可以通过 mapping 的方式建立。
+
+
+
+我们创建一个索引，设置写入的文档数据的向量字段叫 "vec"，索引类型是 flat, 向量维度是 768
+
+```plaintext
+[PUT] /es/${indexName}/_mapping
+
+{
+"properties":{
+        "vec":{
+            "type":"vector", 
+            "dims":768,
+            "m":64,
+            "nbits":8,
+            "vec_index_type":"flat"
+        }
+    }
+}
+```
+
+
+参数说明：
+
+```plaintext
+${indexName} zincIndex 索引名称
+
+type  固定为 vector，表示向量索引
+dims  向量维度
+m     ivf_pq 索引所需参数，需要能被 dims整除
+nbits ivf_pq 索引所需参数，默认为 8
+vec_index_type 索引类型，支持 flat, ivf_pq 两种
+```
+
+
+### 写入包含向量的document
+
+
+
+写入包含向量 document 与写入普通document 在 API层面并无差异，可自行选择合适的方式。
+
+
+
+下面以 bluk API 为例
+
+```plaintext
+[POST] /es/_bulk
+
+body:
+
+{ "index" : { "_index" : "index1" } } 
+{"name": "jack1","vec":[10.2,10.41,9.5,22.2]}
+{ "index" : { "_index" : "index1" } } 
+{"name": "jack2","vec":[10.2,11.41,9.5,22.2]}
+{ "index" : { "_index" : "index1" } } 
+{"name": "jack3","vec":[10.2,12.41,9.5,22.2]}
+```
+
+
+注意 _bulk API 严格要求每一行的格式，数据不能超过一行，详细请参考 [ES bulk](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)
+
+
+
+修改和删除，也可以使用 bulk，删除 document 之后，其对应的向量数据同样会被删除
+
+
+
+### 检索向量
+
+
+
+通过传入一个 向量，搜索系统中N个相似的向量，并返回对应文档信息：
+
+```plaintext
+[POST] /api/${indexName}/_search/vector
+
+body:
+{
+    {
+    "query_field":"vec",
+    "k":7,
+    "return_fields":["name"],
+    "vector":[10.2,10.40,9.5,22.2.......],
+    "_source":false
+    }
+}
+```
+
+
+API 响应格式与 全文检索格式相同。
+
+
+
+以下是参数说明：
+
+```plaintext
+${indexName} zincIndex 索引名称
+
+query_field 要检索 index 中的哪个字段，字段必须为 vector 类型
+k 要返回的 K 个最相似的向量数量
+return_fields 单独返回的字段名称
+vector 用于查询的向量
+nprobe 仅对 ivf_pq 索引类型生效，要查询的聚蔟数量，数量越高，越精确
+_source 用于控制是否返回 _source 字段，支持 bool或者一个数组，描述需要返回哪些字段
+```
+
+
+### 重建索引
+
+
+
+立即对索引进行重建，适用于不等待后台自动检测的情况
+
+```plaintext
+[POST] /api/:target/:field/_rebuild
+```
+
+
+### 查询 recall
+
+
+
+对于 ivf_pq 类型的向量，可以对其数据进行 recall 检查
+
+```plaintext
+[POST] /api/:target/_recall
+{
+    "field":"vec_001", # 要测试的字段
+    "k":10, 
+    "nprobe":5, # nprobe 数量
+    "query_count":1000 # 进行测试的次数
+}
+```
+
+
+# 向量检索使用示例
+
+
+
+接下来实际演示如何 索引一批 papers，每个 paper 可能包含多个需要被索引的向量，我们希望通过 向量检索，得到最相似的 N 个向量，从而得到其对应的 paper-id。
+
+
+
+## 创建 SeaSearch 索引与向量索引
+
+
+
+首先是设定 向量索引的 mapping，在设定mapping时，index 和向量索引 会自动创建
+
+
+
+由于 paper-id 只是一个普通的字符串，我们无需进行 analyze, 所以我们设置其类型为 keyword：
+
+```plaintext
+[PUT] /es/paper/_mapping
+
+{
+"properties":{
+        "title-vec":{
+            "type":"vector", 
+            "dims":768,
+            "vec_index_type":"flat",
+            "m":1
+        },
+        "paper-id":{
+            "type":"keyword"
+        }
+    }
+}
+```
+
+
+通过以上请求，我们创建了一个名为 paper 的 index，并为索引的 title-vec 字段，建立了 flat 类型的向量索引。
+
+
+
+## 索引数据
+
+
+
+我们通过 _bulk API 批量向 SeaSearch 写入这些 paper 数据
+
+```plaintext
+[POST] /es/_bulk
+
+{ "index" : {"_index" : "paper" } } 
+{"paper-id": "001","
+{ "
+{"paper-id": "002","title-vec":[10.2,11.40,9.5,22.2....]}
+{ "
+{"paper-id": "003","title-vec":[10.2,12.40,9.5,22.2....]}
+....
+```
+
+
+## 检索数据
+
+
+
+现在我们可以用向量检索：
+
+```plaintext
+[POST] /api/paper/_search/vector
+
+{
+    "query_field":"title-vec",
+    "k":10,
+    "return_fields":["paper-id"],
+    "vector":[10.2,10.40,9.5,22.2....]
+}
+```
+
+
+可以检索出最相似的向量对应的 document，并得到 paper-id。由于一个 paper 可能包含多个 向量，如果某个 paper 的多个向量都与查询的 向量 非常相似，那么这个 paper-id 可能出现在结果中多次。
+
+
+
+## 维护向量数据
+
+
+
+### 直接更新document
+
+
+
+在一个 document 成功导入之后，SeaSearch会返回其 doc id，我们可以根据 doc id 直接更新一个document：
+
+```plaintext
+[POST] /es/_bulk
+
+{ "update" : {"_id":"23gZX9eT6QM","_index" : "paper" } } 
+{"paper-id": "005","vec":[10.2,1.43,9.5,22.2...]}
+```
+
+
+### 先查询再更新
+
+
+
+如果没有保存返回的 doc id，可以先利用 SeaSearch 的全文检索功能，查询 paper-id 对应的docuemnts：
+
+```plaintext
+[POST] /es/paper/_search
+
+{
+    "query": {
+        "bool": {
+            "must": [
+                {
+                    "term": {"paper-id":"003"}
+                }
+            ]
+        }
+    }
+}
+```
+
+
+通过 DSL，我们可以直接检索到 paper-id 对应的 document 以及其 doc id。
+
+
+
+### 全量更新 paper
+
+
+
+一个 paper 包含多个向量，如果某个向量需要更新，那么我们直接更新这个向量对应的 document即可，但是在实际应用中，区分一个 paper的内容哪些是新增的，哪些是更新的，是不太容易的。
+
+
+
+我们可以采用全量更新的方式：
+
+
+
+  * 首先通过 DSL 查询出一个 paper 所有的 document
+
+  * 删除所有的 document
+
+  * 导入最新的 paper 数据
+
+
+
+
+
+
+第2和第3步，可以在一个 批量 操作中进行。
+
+
+
+下面的例子将演示删除 paper 001 的 document，并重新导入；同时，直接更新 paper 005 和 paper 006，因为它们只有一个向量：
+
+```plaintext
+[POST] /es/_bulk
+
+{ "delete" : {"_id":"23gZX9eT6Q8","_index" : "paper" } } 
+{ "delete" : {"_id":"23gZX9eT6Q0","_index" : "
+{ "delete" : {"_id":"23gZX9eT6Q3","_index" : "
+{ "index" : {"_index" : "
+{"paper-id": "001","vec":[10.2,1.41,9.5,22.2...]}
+{ "
+{"
+{ "
+{"
+{ "update" : {"_id":"23gZX9eT6QM","_index" : "paper" } } 
+{"paper-id": "005","vec":[10.2,1.43,9.5,22.2...]}
+{ "update" : {"_id":"23gZX9eT6QY","_index" : "paper" } } 
+{"paper-id": "006","vec":[10.2,1.43,9.5,22.2...]}
+```
+
--- a/manual/config/README.md
+++ b/manual/config/README.md
@ -0,0 +1,69 @@
+# SeaSearch 配置项目
+
+官方配置可以参考：[https://zincsearch-docs.zinc.dev/environment-variables/](https://zincsearch-docs.zinc.dev/environment-variables/)
+
+以下配置说明，为我们扩展的配置项，所有配置，都是以环境变量的方式设置的。
+
+## 扩展配置
+
+```
+GIN_MODE gin框架的日志模式，默认为 release
+ZINC_WAL_ENABLE 是否启用 WAL，默认启用
+ZINC_STORAGE_TYPE
+ZINC_MAX_OBJ_CACHE_SIZE 启用 s3，oss时，本地最大缓存文件大小
+ZINC_SHARD_LOAD_OBJS_GOROUTINE_NUM 索引加载并行度，在启用s3和Oss时，能提升索引载速度
+
+ZINC_SHARD_NUM zincsearch 原有默认为 3，由于 seaseach 都是每个资料库一个索引，为了提升加载效率，改为默认为 1
+
+s3相关，仅在 ZINC_STORAGE_TYPE=s3 时生效
+ZINC_S3_ACCESS_ID
+ZINC_S3_USE_V4_SIGNATURE
+ZINC_S3_ACCESS_SECRET
+ZINC_S3_ENDPOINT
+ZINC_S3_USE_HTTPS
+ZINC_S3_PATH_STYLE_REQUEST
+ZINC_S3_AWS_REGION
+
+oss相关，仅在 ZINC_STORAGE_TYPE=oss 时生效
+ZINC_OSS_ACCESS_ID
+ZINC_OSS_ACCESS_SECRET
+ZINC_OSS_BUCKET
+ZINC_OSS_ENDPOINT
+
+集群相关
+ZINC_SERVER_MODE 默认 none 为单机部署，可选 cluster,集群时必须为 cluster
+ZINC_CLUSTER_ID 集群id，需要全局唯一
+ZINC_ETCD_ENDPOINTS etcd 地址
+ZINC_ETCD_ENDPOINTS etcd key前缀 默认 /zinc
+ZINC_ETCD_USERNAME  etcd 用户名
+ZINC_ETCD_PASSWORD  etcd 密码
+
+日志相关
+ZINC_LOG_OUTPUT 是否将日志输出到文件，默认 是
+ZINC_LOG_DIR 日志目录，建议配置，默认为当前目录下的 log 子目录
+ZINC_LOG_LEVEL 日志级别，默认 debug
+
+```
+
+## proxy 配置
+
+```
+ZINC_CLUSTER_PROXY_LOG_DIR=./log 
+ZINC_CLUSTER_PROXY_HOST=0.0.0.0
+ZINC_CLUSTER_PROXY_PORT=4082
+ZINC_SERVER_MODE=proxy #必须为proxy
+ZINC_ETCD_ENDPOINTS=127.0.0.1:2379
+ZINC_ETCD_PREFIX=/zinc
+ZINC_MAX_DOCUMENT_SIZE=1m #bulk和multisearch 对单个最大document的限制，默认1m
+ZINC_CLUSTER_MANAGER_ADDR=127.0.0.1:4081 #manager 地址
+```
+
+## cluster-manger 配置
+
+```
+ZINC_CLUSTER_MANAGER_LOG_DIR=./log
+ZINC_CLUSTER_MANAGER_HOST=0.0.0.0
+ZINC_CLUSTER_MANAGER_PORT=4081
+ZINC_CLUSTER_MANAGER_ETCD_ENDPOINTS=127.0.0.1:2379
+ZINC_CLUSTER_MANAGER_ETCD_PREFIX=/zinc
+```
--- a/manual/deploy/README.md
+++ b/manual/deploy/README.md
@ -0,0 +1,30 @@
+# 启动 SeaSearch
+
+## 启动单机 SeaSearch
+
+对于开发环境而言，只需要按照官方说明，配置 启动帐号和启动密码两个 环境变量即可。
+
+编译 SeaSearch 参考： [Setup](../setup/README.md)
+
+对于开发环境，直接配置环境变量，并启动二进制文件即可；
+
+以下命令会首先创建一个 data文件夹，作为默认的存储路径，之后以 admin 以及 Complexpass#123作为初始用户，启动一个 SeaSearch 程序，并默认监听4080端口：
+
+```
+mkdir data
+ZINC_FIRST_ADMIN_USER=admin ZINC_FIRST_ADMIN_PASSWORD=Complexpass#123 GIN_MODE=release ./SeaSearch
+```
+
+如果需要重置数据，删除整个 data 目录再重启即可，这会清理所有元数据以及索引数据。
+
+## 启动集群
+
+# 集群部署
+
+1. 启动 etcd
+
+2. 启动 SeaSearch 节点，节点会自动向 etcd 注册心跳。
+
+3. 启动 cluster-manager，然后通过 API 或者 直接向 etcd 设置 cluster-info，设置SeaSearch 节点的地址。并且同时，cluster-manager 开始根据节点心跳对分片进行分配。
+
+4. 启动 SeaSearch-proxy，此时就可以对外提供服务了。
--- a/manual/media/favicon.ico
+++ b/manual/media/favicon.ico
--- a/manual/media/seafile-transparent-1024.png
+++ b/manual/media/seafile-transparent-1024.png
--- a/manual/setup/README.md
+++ b/manual/setup/README.md
@ -0,0 +1,247 @@
+# 安装 SeaSearch
+
+原版的 SeaSearch 采用纯 go 语言编写，直接通过 Go 编译工具即可编译。在我们引入向量检索功能时，用到了 faiss 库，这个库需要以 CGO 的方式调用，所以对 SeaSearch 的编译会产生影响。
+
+## 安装 faiss
+
+要在一台机器上编译或者运行 SeaSearch，需要这台机器安装 faiss 库。下面是具体安装步骤，适用于 x86 linux 机器，流程采用的操作系统为 debian 12，使用 apt 作为包管理器
+
+### 前提条件
+
+通过包管理器安装，如果连接速度慢，可以尝试更换源
+
+ubuntu 参考：[https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/](https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/)
+
+debian 参考：[https://mirrors.tuna.tsinghua.edu.cn/help/debian/](https://mirrors.tuna.tsinghua.edu.cn/help/debian/)
+
+换源之后，执行
+
+```
+sudo apt update
+```
+
+C++ 编译器，支持C++17及以上
+
+可以通过 apt 安装
+
+```
+sudo apt install -y gcc
+```
+
+Cmake，3.23.1 以上，如果源不是最新，可以从 ppa 或者源码安装
+
+```
+sudo apt install -y cmake
+```
+
+wget swig gnupg libomp
+
+```
+sudo apt install -y wget swig gnupg libomp-dev
+```
+
+nodeJs;
+
+```
+sudo apt-get update && sudo apt-get install -y ca-certificates curl gnupg
+curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg
+NODE_MAJOR=20
+echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | sudo tee /etc/apt/sources.list.d/nodesource.list
+sudo apt update && sudo apt install nodejs -y
+```
+
+### 安装 Intel MKL库 （可选，仅支持x86 cpu）
+
+faiss 依赖 BLAS，并且推荐使用 intel MKL性能最佳
+
+```
+wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \
+| gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
+
+echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" |sudo tee /etc/apt/sources.list.d/oneAPI.list
+
+sudo apt update
+
+sudo apt install -y intel-oneapi-mkl-devel
+```
+
+执行完毕之后，MKL库就安装完毕了，再配置一个环境变量：
+
+```
+export MKL_PATH=/opt/intel/oneapi/mkl/latest/lib/intel64
+```
+
+### 非 x86 cpu 安装BLAS
+
+非x86  cpu无法安装 MKL，可安装 OpenBLAS 实现
+
+```
+sudo apt install -y libatlas-base-dev libatlas3-base
+```
+
+### 编译 faiss
+
+下载 faiss 源码，ssh方式：
+
+```
+git clone git@github.com:facebookresearch/faiss.git
+```
+
+或者 http方式：
+
+```
+git clone https://github.com/facebookresearch/faiss.git
+```
+
+进入 faiss 目录，如果安装了 MKL，执行：
+
+```
+cmake -B build -DFAISS_ENABLE_GPU=OFF \
+    -DFAISS_ENABLE_C_API=ON \
+    -DFAISS_ENABLE_PYTHON=OFF \
+    -DBLA_VENDOR=Intel10_64_dyn  \
+    -DBUILD_SHARED_LIBS=ON \
+    "-DMKL_LIBRARIES=-Wl,--start-group;${MKL_PATH}/libmkl_intel_lp64.a;${MKL_PATH}/libmkl_gnu_thread.a;${MKL_PATH}/libmkl_core.a;-Wl,--end-group" \
+    .
+```
+
+如果未安装 MKL，执行：
+
+```
+cmake -B build -DFAISS_ENABLE_GPU=OFF \
+    -DFAISS_ENABLE_C_API=ON \
+    -DFAISS_ENABLE_PYTHON=OFF \
+    -DBUILD_SHARED_LIBS=ON=ON \
+    -DBUILD_TESTING=OFF \    
+    .
+```
+
+执行编译
+
+```
+make -C build
+```
+
+安装头文件：
+
+```
+sudo make -C build install
+```
+
+将编译好的动态链接库，拷贝到系统路径，这里的 /tmp/faiss 是faiss源码路径，替换为真正的路径即可：
+
+```
+sudo cp /tmp/faiss/build/c_api/libfaiss_c.so /usr/lib
+```
+
+完整安装脚本可以参考 SeaSearch 项目目录下的 /ci/install\_faiss.sh
+
+## 编译 SeaSearch
+
+faiss 已经安装完毕，可以开始编译 SeaSearch了
+
+首先下载 SeaSearch源码：
+
+```
+git clone git@github.com:seafileltd/seasearch.git
+```
+
+或者 http方式：
+
+```
+git clone https://github.com/seafileltd/seasearch.git
+```
+
+编译前端静态文件
+
+```
+cd web
+npm config set registry https://registry.npmmirror.com
+npm install
+npm run build
+```
+
+安装 go 语言环境 Go 1.20 以上
+
+参考 [https://go.dev/doc/install](https://go.dev/doc/install)
+
+需要确保启用了 CGO
+
+```
+export CGO_ENABLED=1
+```
+
+可选，更换 go 源：
+
+```
+go env -w  GOPROXY=https://goproxy.cn,direct
+```
+
+之后在项目根目录执行：
+
+```
+go build -o seasearch ./cmd/zincsearch/
+```
+
+以上步骤执行完毕，可以在项目的根目录下面得到最终的 seasearch 二进制文件了。
+
+一般来说无需手动指定头文件和动态链接库位置，如果编译提示找不到头文件，或者找不到动态运行库，可以在编译时通过环境变量指定位置：
+
+```
+CGO_CFLAGS=-I /usr/local/include #你的C
+CGO_LDFLAGS=-I /usr/lib  
+```
+
+如果运行时，提示找不到 动态链接库，可以通过：
+
+```
+LD_LIBRARY_PATH=/usr/lib #指定动态链接库目录
+```
+
+## 编译 seasearch proxy 和 cluster manger
+
+在集群下，需要编译部署 seasearch proxy 和 cluster manager
+
+编译 proxy:
+
+```
+go build -o seasearch-proxy ./cmd/zinc-proxy/main.go
+```
+
+编译 cluster manager：
+
+```
+go build -o cluster-manager ./cmd/cluster-manager/main.go
+```
+
+
+## 发布
+
+项目根目录下有 Dokcerfile 文件，可以根据此文件构建 docker 镜像
+
+注意：构建此 docker 镜像，需要确保能正常访问 github，否则无法下载 faiss 源码会导致构建失败, 并且仅支持 x86 cpu，arm 需要设置 platform 参数模拟 x86
+
+```
+docker build -f ./Dockerfile .
+```
+
+## Mac 中存在的安装问题
+
+### faiss 安装
+
+faiss 可通过 brew install faiss 安装
+
+### fatal error: 'faiss/c\_api/AutoTune\_c.h' file not found
+
+执行如下命令解决：
+
+source: [https://github.com/DataIntelligenceCrew/go-faiss/issues/7](https://github.com/DataIntelligenceCrew/go-faiss/issues/7)
+
+```
+cd faiss
+export CMAKE_PREFIX_PATH=/opt/homebrew/opt/openblas:/opt/homebrew/opt/libomp:/opt/homebrew
+cmake -B build -DFAISS_ENABLE_GPU=OFF -DFAISS_ENABLE_C_API=ON -DBUILD_SHARED_LIBS=ON -DFAISS_ENABLE_PYTHON=OFF .
+make -C build
+sudo make -C build install
+sudo cp build/c_api/libfaiss_c.dylib /usr/local/lib/libfaiss_c.dylib
+```
--- a/manual/setup/compile_seasearch.md
+++ b/manual/setup/compile_seasearch.md
@ -0,0 +1,110 @@
+
+# 编译 SeaSearch
+
+faiss 已经安装完毕，可以开始编译 SeaSearch了
+
+首先下载 SeaSearch源码：
+
+```
+git clone git@github.com:seafileltd/seasearch.git
+```
+
+或者 http方式：
+
+```
+git clone https://github.com/seafileltd/seasearch.git
+```
+
+编译前端静态文件
+
+```
+cd web
+npm config set registry https://registry.npmmirror.com
+npm install
+npm run build
+```
+
+安装 go 语言环境 Go 1.20 以上
+
+参考 [https://go.dev/doc/install](https://go.dev/doc/install)
+
+需要确保启用了 CGO
+
+```
+export CGO_ENABLED=1
+```
+
+可选，更换 go 源：
+
+```
+go env -w  GOPROXY=https://goproxy.cn,direct
+```
+
+之后在项目根目录执行：
+
+```
+go build -o seasearch ./cmd/zincsearch/
+```
+
+以上步骤执行完毕，可以在项目的根目录下面得到最终的 seasearch 二进制文件了。
+
+一般来说无需手动指定头文件和动态链接库位置，如果编译提示找不到头文件，或者找不到动态运行库，可以在编译时通过环境变量指定位置：
+
+```
+CGO_CFLAGS=-I /usr/local/include #你的C
+CGO_LDFLAGS=-I /usr/lib  
+```
+
+如果运行时，提示找不到 动态链接库，可以通过：
+
+```
+LD_LIBRARY_PATH=/usr/lib #指定动态链接库目录
+```
+
+# 编译 seasearch proxy 和 cluster manger
+
+在集群下，需要编译部署 seasearch proxy 和 cluster manager
+
+编译 proxy:
+
+```
+go build -o seasearch-proxy ./cmd/zinc-proxy/main.go
+```
+
+编译 cluster manager：
+
+```
+go build -o cluster-manager ./cmd/cluster-manager/main.go
+```
+
+
+# 发布
+
+项目根目录下有 Dokcerfile 文件，可以根据此文件构建 docker 镜像
+
+注意：构建此 docker 镜像，需要确保能正常访问 github，否则无法下载 faiss 源码会导致构建失败, 并且仅支持 x86 cpu，arm 需要设置 platform 参数模拟 x86
+
+```
+docker build -f ./Dockerfile .
+```
+
+# Mac 中存在的安装问题
+
+## faiss 安装
+
+faiss 可通过 brew install faiss 安装
+
+## fatal error: 'faiss/c\_api/AutoTune\_c.h' file not found
+
+执行如下命令解决：
+
+source: [https://github.com/DataIntelligenceCrew/go-faiss/issues/7](https://github.com/DataIntelligenceCrew/go-faiss/issues/7)
+
+```
+cd faiss
+export CMAKE_PREFIX_PATH=/opt/homebrew/opt/openblas:/opt/homebrew/opt/libomp:/opt/homebrew
+cmake -B build -DFAISS_ENABLE_GPU=OFF -DFAISS_ENABLE_C_API=ON -DBUILD_SHARED_LIBS=ON -DFAISS_ENABLE_PYTHON=OFF .
+make -C build
+sudo make -C build install
+sudo cp build/c_api/libfaiss_c.dylib /usr/local/lib/libfaiss_c.dylib
+```
--- a/manual/setup/install_faiss.md
+++ b/manual/setup/install_faiss.md
@ -0,0 +1,133 @@
+# 安装 faiss
+
+要在一台机器上编译或者运行 SeaSearch，需要这台机器安装 faiss 库。下面是具体安装步骤，适用于 x86 linux 机器，流程采用的操作系统为 debian 12，使用 apt 作为包管理器
+
+## 前提条件
+
+通过包管理器安装，如果连接速度慢，可以尝试更换源
+
+ubuntu 参考：[https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/](https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/)
+
+debian 参考：[https://mirrors.tuna.tsinghua.edu.cn/help/debian/](https://mirrors.tuna.tsinghua.edu.cn/help/debian/)
+
+换源之后，执行
+
+```
+sudo apt update
+```
+
+C++ 编译器，支持C++17及以上
+
+可以通过 apt 安装
+
+```
+sudo apt install -y gcc
+```
+
+Cmake，3.23.1 以上，如果源不是最新，可以从 ppa 或者源码安装
+
+```
+sudo apt install -y cmake
+```
+
+wget swig gnupg libomp
+
+```
+sudo apt install -y wget swig gnupg libomp-dev
+```
+
+nodeJs;
+
+```
+sudo apt-get update && sudo apt-get install -y ca-certificates curl gnupg
+curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg
+NODE_MAJOR=20
+echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | sudo tee /etc/apt/sources.list.d/nodesource.list
+sudo apt update && sudo apt install nodejs -y
+```
+
+## 安装 Intel MKL库 （可选，仅支持x86 cpu）
+
+faiss 依赖 BLAS，并且推荐使用 intel MKL性能最佳
+
+```
+wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \
+| gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
+
+echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" |sudo tee /etc/apt/sources.list.d/oneAPI.list
+
+sudo apt update
+
+sudo apt install -y intel-oneapi-mkl-devel
+```
+
+执行完毕之后，MKL库就安装完毕了，再配置一个环境变量：
+
+```
+export MKL_PATH=/opt/intel/oneapi/mkl/latest/lib/intel64
+```
+
+## 非 x86 cpu 安装BLAS
+
+非x86  cpu无法安装 MKL，可安装 OpenBLAS 实现
+
+```
+sudo apt install -y libatlas-base-dev libatlas3-base
+```
+
+## 编译 faiss
+
+下载 faiss 源码，ssh方式：
+
+```
+git clone git@github.com:facebookresearch/faiss.git
+```
+
+或者 http方式：
+
+```
+git clone https://github.com/facebookresearch/faiss.git
+```
+
+进入 faiss 目录，如果安装了 MKL，执行：
+
+```
+cmake -B build -DFAISS_ENABLE_GPU=OFF \
+    -DFAISS_ENABLE_C_API=ON \
+    -DFAISS_ENABLE_PYTHON=OFF \
+    -DBLA_VENDOR=Intel10_64_dyn  \
+    -DBUILD_SHARED_LIBS=ON \
+    "-DMKL_LIBRARIES=-Wl,--start-group;${MKL_PATH}/libmkl_intel_lp64.a;${MKL_PATH}/libmkl_gnu_thread.a;${MKL_PATH}/libmkl_core.a;-Wl,--end-group" \
+    .
+```
+
+如果未安装 MKL，执行：
+
+```
+cmake -B build -DFAISS_ENABLE_GPU=OFF \
+    -DFAISS_ENABLE_C_API=ON \
+    -DFAISS_ENABLE_PYTHON=OFF \
+    -DBUILD_SHARED_LIBS=ON=ON \
+    -DBUILD_TESTING=OFF \    
+    .
+```
+
+执行编译
+
+```
+make -C build
+```
+
+安装头文件：
+
+```
+sudo make -C build install
+```
+
+将编译好的动态链接库，拷贝到系统路径，这里的 /tmp/faiss 是faiss源码路径，替换为真正的路径即可：
+
+```
+sudo cp /tmp/faiss/build/c_api/libfaiss_c.so /usr/lib
+```
+
+完整安装脚本可以参考 SeaSearch 项目目录下的 /ci/install\_faiss.sh
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -0,0 +1,53 @@
+site_name: SeaSearch Manual
+site_author: seafile
+docs_dir: ./manual
+site_url: https://haiwen.github.io/seasearch-docs/
+
+repo_name: haiwen/seasearch-docs
+repo_url: https://github.com/haiwen/seasearch-docs/
+edit_uri: blob/master/manual
+
+copyright: Copyright &copy; 2023 Seafile Ltd.
+
+theme:
+  name: material
+  logo: media/seafile-transparent-1024.png
+  favicon: media/favicon.ico
+  palette:
+    primary: white
+    accent:
+
+plugins:
+  - search
+  - awesome-pages
+
+# Customization
+extra:
+  social:
+    - icon: fontawesome/brands/github
+      link: https://github.com/haiwen/seasearch-docs/
+
+extra_css:
+  - stylesheets/extra.css
+
+# Extensions
+markdown_extensions:
+  - markdown.extensions.admonition
+  - markdown.extensions.attr_list
+  - markdown.extensions.codehilite:
+      guess_lang: true
+  - markdown.extensions.def_list
+  - markdown.extensions.footnotes
+  - markdown.extensions.meta
+  - markdown.extensions.toc:
+      permalink: true
+      toc_depth: "1-4"
+
+# Page tree
+nav:
+  - Setup:
+    - Installation of SeaSearch: setup/README.md
+  - Deploy:
+      - Deploy SeaSearch: deploy/README.md
+  - Configuration: config/README.md
+  - SeaSearch API: api/seasearch_api.md