init seasearch docs
This commit is contained in:
commit
1088265817
15 changed files with 1378 additions and 0 deletions
18
.github/workflows/deploy.yml
vendored
Normal file
18
.github/workflows/deploy.yml
vendored
Normal file
|
@ -0,0 +1,18 @@
|
|||
name: Deploy CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- master
|
||||
|
||||
jobs:
|
||||
deploy:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- uses: actions/setup-python@v2
|
||||
with:
|
||||
python-version: 3.x
|
||||
- run: pip install mkdocs-material mkdocs-awesome-pages-plugin mkdocs-material-extensions
|
||||
- run: cd $GITHUB_WORKSPACE
|
||||
- run: mkdocs gh-deploy --force
|
2
.gitignore
vendored
Normal file
2
.gitignore
vendored
Normal file
|
@ -0,0 +1,2 @@
|
|||
*~
|
||||
/.idea
|
13
LICENSE.txt
Normal file
13
LICENSE.txt
Normal file
|
@ -0,0 +1,13 @@
|
|||
Copyright (c) 2016 Seafile Ltd.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
15
README.md
Normal file
15
README.md
Normal file
|
@ -0,0 +1,15 @@
|
|||
# SeaSearch Docs
|
||||
|
||||
Manual for SeaSearch
|
||||
|
||||
The web site: https://haiwen.github.io/seasearch-docs/
|
||||
|
||||
## Serve docs locally
|
||||
|
||||
These docs are built using 'mkdocs'. Install the tooling by running:
|
||||
|
||||
```
|
||||
pip3 install mkdocs-material mkdocs-awesome-pages-plugin mkdocs-material-extensions
|
||||
```
|
||||
|
||||
Start up the development server by running `mkdocs serve` in the project root directory. Browse at `http://127.0.0.1:8000/seasearch-docs/`.
|
1
manual/CNAME
Normal file
1
manual/CNAME
Normal file
|
@ -0,0 +1 @@
|
|||
manual.seafile.com
|
3
manual/README.md
Normal file
3
manual/README.md
Normal file
|
@ -0,0 +1,3 @@
|
|||
# Introduction
|
||||
|
||||
ZincSearch 是一个 Go 语言实现的全文检索服务器,提供了兼容 ElasticSearch DSL 的 API。它采用了 Bluge 作为索引引擎。Bluge 是一个广泛使用的 Go 语言全文索引库 Bleve(由 CouchBase 公司开发)的 fork 版本,对代码进行重构改造,使得它更加现代化和灵活。
|
684
manual/api/seasearch_api.md
Normal file
684
manual/api/seasearch_api.md
Normal file
|
@ -0,0 +1,684 @@
|
|||
|
||||
|
||||
# API 介绍
|
||||
|
||||
|
||||
|
||||
SeaSearch 通过 Http Basic Auth 进行权限校验,API 请求需要在 header 中携带对应的 token。
|
||||
|
||||
|
||||
|
||||
生成 basic auth 可以通过这个工具: [http://web.chacuo.net/safebasicauth](http://web.chacuo.net/safebasicauth)
|
||||
|
||||
|
||||
|
||||
## 用户管理
|
||||
|
||||
|
||||
|
||||
### 管理员用户
|
||||
|
||||
|
||||
|
||||
SeaSearch 通过账户来管理API权限等,程序在第一次启动时,需要通过环境变量配置一个管理员帐号
|
||||
|
||||
|
||||
|
||||
以下是 管理员帐号示例:
|
||||
|
||||
```plaintext
|
||||
set ZINC_FIRST_ADMIN_USER=admin
|
||||
set ZINC_FIRST_ADMIN_PASSWORD=Complexpass#123
|
||||
```
|
||||
|
||||
|
||||
### 普通用户
|
||||
|
||||
|
||||
|
||||
可以通过API来创建/更新用户:
|
||||
|
||||
```plaintext
|
||||
[POST] /api/user
|
||||
|
||||
{
|
||||
"_id": "prabhat",
|
||||
"name": "Prabhat Sharma",
|
||||
"role": "admin", // or user
|
||||
"password": "Complexpass#123"
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
获取所有用户:
|
||||
|
||||
```plaintext
|
||||
[GET] /api/user
|
||||
```
|
||||
|
||||
|
||||
删除用户:
|
||||
|
||||
```plaintext
|
||||
[DELETE] /api/user/${userId}
|
||||
```
|
||||
|
||||
|
||||
|
||||
## 索引相关
|
||||
|
||||
|
||||
|
||||
### 创建索引
|
||||
|
||||
|
||||
|
||||
创建一个 SeaSearch 索引,并且在此时可以同时设置 mappings 以及 settings。
|
||||
|
||||
|
||||
|
||||
我们也可以直接通过其他请求设置 settings 或者 mapping,如果 index不存在,则会自动创建。
|
||||
|
||||
|
||||
|
||||
SeaSearch 文档:[https://zincsearch-docs.zinc.dev/api/index/create/#update-a-exists-index](https://zincsearch-docs.zinc.dev/api/index/create/#update-a-exists-index)
|
||||
|
||||
|
||||
|
||||
参考 ES api文档:[https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html)
|
||||
|
||||
|
||||
|
||||
### 配置 mappings
|
||||
|
||||
|
||||
|
||||
mappings 定义了 document 中,字段的规则,例如类型,格式等。
|
||||
|
||||
|
||||
|
||||
可以通过单独的 API 来配置 mapping:
|
||||
|
||||
|
||||
|
||||
SeaSearch api: [https://zincsearch-docs.zinc.dev/api-es-compatible/index/update-mapping/](https://zincsearch-docs.zinc.dev/api-es-compatible/index/update-mapping/)
|
||||
|
||||
|
||||
|
||||
ES 相关说明:[https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html)
|
||||
|
||||
|
||||
|
||||
### 配置 settings
|
||||
|
||||
|
||||
|
||||
settings 设置了 index 的 analyzer 分片等相关设置。
|
||||
|
||||
|
||||
|
||||
SeaSearch api: [https://zincsearch-docs.zinc.dev/api-es-compatible/index/update-settings/](https://zincsearch-docs.zinc.dev/api-es-compatible/index/update-settings/)
|
||||
|
||||
|
||||
|
||||
ES 相关说明:
|
||||
|
||||
|
||||
|
||||
* analyzer 相关概念:[https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-concepts.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-concepts.html)
|
||||
|
||||
* 如何指定 analyzer:[https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
### analyzer 支持
|
||||
|
||||
|
||||
|
||||
analzyer 可以在创建索引索引时配置 default ,也可以针对某个字段进行设置。(参考上一节中 settings ES 的文档了解相关概念。)
|
||||
|
||||
|
||||
|
||||
SeaSearch 支持的 analyzer可以在这个页面中找到:[https://zincsearch-docs.zinc.dev/api/index/analyze/](https://zincsearch-docs.zinc.dev/api/index/analyze/) 里面的 tokenize, token filter 等概念和 ES 是一致的,且支持 ES 大部分常用的 analyzer 和 tokenizer 等。
|
||||
|
||||
|
||||
|
||||
支持的常规analyzer
|
||||
|
||||
|
||||
|
||||
* standard 默认的 analyzer,如果没有指定,则采用此 analyzer,按词切分,小写处理
|
||||
|
||||
* simple 按照非字母切分(符号被过滤),小写处理
|
||||
|
||||
* keyword 不分词,直接将输入当作输出
|
||||
|
||||
* stop 小写处理,停用词过滤器 (the、a、is等)
|
||||
|
||||
* web 由 buluge 实现,匹配 邮箱、url 等。处理小写,使用停用词过滤器
|
||||
|
||||
* regexp/pattern 正则表达式,默认\W+(非字符分割),支持设置 小写、停用词
|
||||
|
||||
* whitespace 按照空格切分,不转小写
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
多语言 analzyer:
|
||||
|
||||
语言| analyzer
|
||||
---|---
|
||||
阿拉伯语| ar
|
||||
丹麦语| da
|
||||
德语| de
|
||||
英语| english
|
||||
西班牙语| es
|
||||
波斯语| fa
|
||||
亚洲地区国家| cjk
|
||||
芬兰语| fi
|
||||
法语| fr
|
||||
印地语| hi
|
||||
匈牙利语| hu
|
||||
意大利语| it
|
||||
荷兰语| nl
|
||||
挪威语| no
|
||||
葡萄牙语| pt
|
||||
罗马尼亚语| ro
|
||||
俄语| ru
|
||||
瑞典语| sv
|
||||
土耳其语| tr
|
||||
索拉尼| ckb
|
||||
|
||||
|
||||
|
||||
中文 analzyer:
|
||||
|
||||
|
||||
|
||||
* gse_standard 使用最短路径算法来分词
|
||||
|
||||
* gse_search 搜索引擎的分词模式,提供尽可能多的关键词
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
中文 analyzer 使用的是 [gse](https://github.com/go-ego/gse) 这个库实现分词,是 python 结巴库的 Golang 实现,默认是没有启用的,需要通过环境变量来启用
|
||||
|
||||
```plaintext
|
||||
ZINC_PLUGIN_GSE_ENABLE=true
|
||||
# true 启用中文分词支持,默认false
|
||||
|
||||
ZINC_PLUGIN_GSE_DICT_EMBED=BIG
|
||||
# BIG:使用gse内置词库与停用词;否则,使用 SeaSearch 内置的简单词库,默认 small
|
||||
|
||||
ZINC_PLUGIN_GSE_ENABLE_STOP=true
|
||||
# true 使用停用词,默认 true
|
||||
|
||||
ZINC_PLUGIN_GSE_ENABLE_HMM=true
|
||||
# 使用 HMM 模式用于搜素分词,默认为 true
|
||||
|
||||
ZINC_PLUGIN_GSE_DICT_PATH=./plugins/gse/dict
|
||||
# 使用用户自定义词库与停用词,需要将内容放在配置的这个路径下,并且词库命名为 user.txt
|
||||
停用词命名为 stop.txt
|
||||
```
|
||||
|
||||
|
||||
## 全文检索
|
||||
|
||||
|
||||
|
||||
### document CRUD
|
||||
|
||||
|
||||
|
||||
创建 document:
|
||||
|
||||
|
||||
|
||||
SeaSearch :[https://zincsearch-docs.zinc.dev/api-es-compatible/document/create/](https://zincsearch-docs.zinc.dev/api-es-compatible/document/create/)
|
||||
|
||||
|
||||
|
||||
ES api 说明:[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html)
|
||||
|
||||
|
||||
|
||||
更新 document :
|
||||
|
||||
|
||||
|
||||
SeaSearch:[https://zincsearch-docs.zinc.dev/api-es-compatible/document/update/](https://zincsearch-docs.zinc.dev/api-es-compatible/document/update/)
|
||||
|
||||
|
||||
|
||||
ES api 说明:[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html)
|
||||
|
||||
|
||||
|
||||
删除 document:
|
||||
|
||||
|
||||
|
||||
SeaSearch: [https://zincsearch-docs.zinc.dev/api-es-compatible/document/delete/](https://zincsearch-docs.zinc.dev/api-es-compatible/document/delete/)
|
||||
|
||||
|
||||
|
||||
ES api 说明:[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete.html)
|
||||
|
||||
|
||||
|
||||
根据 id获取 document:
|
||||
|
||||
```plaintext
|
||||
[GET] /api/${indexName}/_doc/${docId}
|
||||
```
|
||||
|
||||
|
||||
### 批量进行操作
|
||||
|
||||
|
||||
|
||||
应该尽量使用批量操作更新索引
|
||||
|
||||
|
||||
|
||||
SeaSearch文档: [https://zincsearch-docs.zinc.dev/api-es-compatible/document/bulk/#request](https://zincsearch-docs.zinc.dev/api-es-compatible/document/bulk/#request)
|
||||
|
||||
|
||||
|
||||
ES api说明:[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)
|
||||
|
||||
|
||||
|
||||
### 搜索
|
||||
|
||||
|
||||
|
||||
api示例:
|
||||
|
||||
|
||||
|
||||
[https://zincsearch-docs.zinc.dev/api-es-compatible/search/search/](https://zincsearch-docs.zinc.dev/api-es-compatible/search/search/)
|
||||
|
||||
|
||||
|
||||
全文搜索使用 DSL,使用方法可以参考:
|
||||
|
||||
|
||||
|
||||
[https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html)
|
||||
|
||||
|
||||
|
||||
delete-by-query:根据 query进行删除:
|
||||
|
||||
```plaintext
|
||||
[POST] /es/${indexName}/_delete_by_query
|
||||
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"name": "jack"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
ES api 文档:[https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html)
|
||||
|
||||
|
||||
|
||||
multi-search,支持对不同 index 执行不同的 query:
|
||||
|
||||
|
||||
|
||||
SeaSearch 文档:[https://zincsearch-docs.zinc.dev/api-es-compatible/search/msearch/](https://zincsearch-docs.zinc.dev/api-es-compatible/search/msearch/)
|
||||
|
||||
|
||||
|
||||
ES api 文档:[https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html)
|
||||
|
||||
|
||||
|
||||
我们对 multi-search 做了扩展,使它支持在搜索不同的索引时,使用相同的统计信息,以使得得分计算更加精确,在请求中设置 query:unify_score=true 即可开启。
|
||||
|
||||
```plaintext
|
||||
[POST] /es/
|
||||
|
||||
{"index": "t1"}
|
||||
{"query": {"bool": {"should": [{"match": {"filename": {"query": "数据库", "minimum_should_match": "-25%"}}}, {"match": {"filename.ngram": {"query": "数据库", "minimum_should_match": "80%"}}}], "minimum_should_match": 1}}, "from": 0, "size": 10, "_source": ["path", "repo_id", "filename", "is_dir"], "sort": ["_score"]}
|
||||
{"index": "t2"}
|
||||
{"query": {"bool": {"should": [{"match": {"filename": {"query": "数据库", "minimum_should_match": "-25%"}}}, {"match": {"filename.ngram": {"query": "数据库", "minimum_should_match": "80%"}}}], "minimum_should_match": 1}}, "from": 0, "size": 10, "_source": ["path", "repo_id", "filename", "is_dir"], "sort": ["_score"]}
|
||||
```
|
||||
|
||||
|
||||
## 向量检索
|
||||
|
||||
|
||||
|
||||
我们为 SeaSearch 扩展开发了向量检索的功能,以下是相关API介绍。
|
||||
|
||||
|
||||
|
||||
### 创建向量索引
|
||||
|
||||
|
||||
|
||||
使用向量检索功能,需要提前创建向量索引,可以通过 mapping 的方式建立。
|
||||
|
||||
|
||||
|
||||
我们创建一个索引,设置写入的文档数据的向量字段叫 "vec",索引类型是 flat, 向量维度是 768
|
||||
|
||||
```plaintext
|
||||
[PUT] /es/${indexName}/_mapping
|
||||
|
||||
{
|
||||
"properties":{
|
||||
"vec":{
|
||||
"type":"vector",
|
||||
"dims":768,
|
||||
"m":64,
|
||||
"nbits":8,
|
||||
"vec_index_type":"flat"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
参数说明:
|
||||
|
||||
```plaintext
|
||||
${indexName} zincIndex 索引名称
|
||||
|
||||
type 固定为 vector,表示向量索引
|
||||
dims 向量维度
|
||||
m ivf_pq 索引所需参数,需要能被 dims整除
|
||||
nbits ivf_pq 索引所需参数,默认为 8
|
||||
vec_index_type 索引类型,支持 flat, ivf_pq 两种
|
||||
```
|
||||
|
||||
|
||||
### 写入包含向量的document
|
||||
|
||||
|
||||
|
||||
写入包含向量 document 与写入普通document 在 API层面并无差异,可自行选择合适的方式。
|
||||
|
||||
|
||||
|
||||
下面以 bluk API 为例
|
||||
|
||||
```plaintext
|
||||
[POST] /es/_bulk
|
||||
|
||||
body:
|
||||
|
||||
{ "index" : { "_index" : "index1" } }
|
||||
{"name": "jack1","vec":[10.2,10.41,9.5,22.2]}
|
||||
{ "index" : { "_index" : "index1" } }
|
||||
{"name": "jack2","vec":[10.2,11.41,9.5,22.2]}
|
||||
{ "index" : { "_index" : "index1" } }
|
||||
{"name": "jack3","vec":[10.2,12.41,9.5,22.2]}
|
||||
```
|
||||
|
||||
|
||||
注意 _bulk API 严格要求每一行的格式,数据不能超过一行,详细请参考 [ES bulk](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)
|
||||
|
||||
|
||||
|
||||
修改和删除,也可以使用 bulk,删除 document 之后,其对应的向量数据同样会被删除
|
||||
|
||||
|
||||
|
||||
### 检索向量
|
||||
|
||||
|
||||
|
||||
通过传入一个 向量,搜索系统中N个相似的向量,并返回对应文档信息:
|
||||
|
||||
```plaintext
|
||||
[POST] /api/${indexName}/_search/vector
|
||||
|
||||
body:
|
||||
{
|
||||
{
|
||||
"query_field":"vec",
|
||||
"k":7,
|
||||
"return_fields":["name"],
|
||||
"vector":[10.2,10.40,9.5,22.2.......],
|
||||
"_source":false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
API 响应格式与 全文检索格式相同。
|
||||
|
||||
|
||||
|
||||
以下是参数说明:
|
||||
|
||||
```plaintext
|
||||
${indexName} zincIndex 索引名称
|
||||
|
||||
query_field 要检索 index 中的哪个字段,字段必须为 vector 类型
|
||||
k 要返回的 K 个最相似的向量数量
|
||||
return_fields 单独返回的字段名称
|
||||
vector 用于查询的向量
|
||||
nprobe 仅对 ivf_pq 索引类型生效,要查询的聚蔟数量,数量越高,越精确
|
||||
_source 用于控制是否返回 _source 字段,支持 bool或者一个数组,描述需要返回哪些字段
|
||||
```
|
||||
|
||||
|
||||
### 重建索引
|
||||
|
||||
|
||||
|
||||
立即对索引进行重建,适用于不等待后台自动检测的情况
|
||||
|
||||
```plaintext
|
||||
[POST] /api/:target/:field/_rebuild
|
||||
```
|
||||
|
||||
|
||||
### 查询 recall
|
||||
|
||||
|
||||
|
||||
对于 ivf_pq 类型的向量,可以对其数据进行 recall 检查
|
||||
|
||||
```plaintext
|
||||
[POST] /api/:target/_recall
|
||||
{
|
||||
"field":"vec_001", # 要测试的字段
|
||||
"k":10,
|
||||
"nprobe":5, # nprobe 数量
|
||||
"query_count":1000 # 进行测试的次数
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
# 向量检索使用示例
|
||||
|
||||
|
||||
|
||||
接下来实际演示如何 索引一批 papers,每个 paper 可能包含多个需要被索引的向量,我们希望通过 向量检索,得到最相似的 N 个向量,从而得到其对应的 paper-id。
|
||||
|
||||
|
||||
|
||||
## 创建 SeaSearch 索引与向量索引
|
||||
|
||||
|
||||
|
||||
首先是设定 向量索引的 mapping,在设定mapping时,index 和向量索引 会自动创建
|
||||
|
||||
|
||||
|
||||
由于 paper-id 只是一个普通的字符串,我们无需进行 analyze, 所以我们设置其类型为 keyword:
|
||||
|
||||
```plaintext
|
||||
[PUT] /es/paper/_mapping
|
||||
|
||||
{
|
||||
"properties":{
|
||||
"title-vec":{
|
||||
"type":"vector",
|
||||
"dims":768,
|
||||
"vec_index_type":"flat",
|
||||
"m":1
|
||||
},
|
||||
"paper-id":{
|
||||
"type":"keyword"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
通过以上请求,我们创建了一个名为 paper 的 index,并为索引的 title-vec 字段,建立了 flat 类型的向量索引。
|
||||
|
||||
|
||||
|
||||
## 索引数据
|
||||
|
||||
|
||||
|
||||
我们通过 _bulk API 批量向 SeaSearch 写入这些 paper 数据
|
||||
|
||||
```plaintext
|
||||
[POST] /es/_bulk
|
||||
|
||||
{ "index" : {"_index" : "paper" } }
|
||||
{"paper-id": "001","
|
||||
{ "
|
||||
{"paper-id": "002","title-vec":[10.2,11.40,9.5,22.2....]}
|
||||
{ "
|
||||
{"paper-id": "003","title-vec":[10.2,12.40,9.5,22.2....]}
|
||||
....
|
||||
```
|
||||
|
||||
|
||||
## 检索数据
|
||||
|
||||
|
||||
|
||||
现在我们可以用向量检索:
|
||||
|
||||
```plaintext
|
||||
[POST] /api/paper/_search/vector
|
||||
|
||||
{
|
||||
"query_field":"title-vec",
|
||||
"k":10,
|
||||
"return_fields":["paper-id"],
|
||||
"vector":[10.2,10.40,9.5,22.2....]
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
可以检索出最相似的向量对应的 document,并得到 paper-id。由于一个 paper 可能包含多个 向量,如果某个 paper 的多个向量都与查询的 向量 非常相似,那么这个 paper-id 可能出现在结果中多次。
|
||||
|
||||
|
||||
|
||||
## 维护向量数据
|
||||
|
||||
|
||||
|
||||
### 直接更新document
|
||||
|
||||
|
||||
|
||||
在一个 document 成功导入之后,SeaSearch会返回其 doc id,我们可以根据 doc id 直接更新一个document:
|
||||
|
||||
```plaintext
|
||||
[POST] /es/_bulk
|
||||
|
||||
{ "update" : {"_id":"23gZX9eT6QM","_index" : "paper" } }
|
||||
{"paper-id": "005","vec":[10.2,1.43,9.5,22.2...]}
|
||||
```
|
||||
|
||||
|
||||
### 先查询再更新
|
||||
|
||||
|
||||
|
||||
如果没有保存返回的 doc id,可以先利用 SeaSearch 的全文检索功能,查询 paper-id 对应的docuemnts:
|
||||
|
||||
```plaintext
|
||||
[POST] /es/paper/_search
|
||||
|
||||
{
|
||||
"query": {
|
||||
"bool": {
|
||||
"must": [
|
||||
{
|
||||
"term": {"paper-id":"003"}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
通过 DSL,我们可以直接检索到 paper-id 对应的 document 以及其 doc id。
|
||||
|
||||
|
||||
|
||||
### 全量更新 paper
|
||||
|
||||
|
||||
|
||||
一个 paper 包含多个向量,如果某个向量需要更新,那么我们直接更新这个向量对应的 document即可,但是在实际应用中,区分一个 paper的内容哪些是新增的,哪些是更新的,是不太容易的。
|
||||
|
||||
|
||||
|
||||
我们可以采用全量更新的方式:
|
||||
|
||||
|
||||
|
||||
* 首先通过 DSL 查询出一个 paper 所有的 document
|
||||
|
||||
* 删除所有的 document
|
||||
|
||||
* 导入最新的 paper 数据
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
第2和第3步,可以在一个 批量 操作中进行。
|
||||
|
||||
|
||||
|
||||
下面的例子将演示删除 paper 001 的 document,并重新导入;同时,直接更新 paper 005 和 paper 006,因为它们只有一个向量:
|
||||
|
||||
```plaintext
|
||||
[POST] /es/_bulk
|
||||
|
||||
{ "delete" : {"_id":"23gZX9eT6Q8","_index" : "paper" } }
|
||||
{ "delete" : {"_id":"23gZX9eT6Q0","_index" : "
|
||||
{ "delete" : {"_id":"23gZX9eT6Q3","_index" : "
|
||||
{ "index" : {"_index" : "
|
||||
{"paper-id": "001","vec":[10.2,1.41,9.5,22.2...]}
|
||||
{ "
|
||||
{"
|
||||
{ "
|
||||
{"
|
||||
{ "update" : {"_id":"23gZX9eT6QM","_index" : "paper" } }
|
||||
{"paper-id": "005","vec":[10.2,1.43,9.5,22.2...]}
|
||||
{ "update" : {"_id":"23gZX9eT6QY","_index" : "paper" } }
|
||||
{"paper-id": "006","vec":[10.2,1.43,9.5,22.2...]}
|
||||
```
|
||||
|
69
manual/config/README.md
Normal file
69
manual/config/README.md
Normal file
|
@ -0,0 +1,69 @@
|
|||
# SeaSearch 配置项目
|
||||
|
||||
官方配置可以参考:[https://zincsearch-docs.zinc.dev/environment-variables/](https://zincsearch-docs.zinc.dev/environment-variables/)
|
||||
|
||||
以下配置说明,为我们扩展的配置项,所有配置,都是以环境变量的方式设置的。
|
||||
|
||||
## 扩展配置
|
||||
|
||||
```
|
||||
GIN_MODE gin框架的日志模式,默认为 release
|
||||
ZINC_WAL_ENABLE 是否启用 WAL,默认启用
|
||||
ZINC_STORAGE_TYPE
|
||||
ZINC_MAX_OBJ_CACHE_SIZE 启用 s3,oss时,本地最大缓存文件大小
|
||||
ZINC_SHARD_LOAD_OBJS_GOROUTINE_NUM 索引加载并行度,在启用s3和Oss时,能提升索引载速度
|
||||
|
||||
ZINC_SHARD_NUM zincsearch 原有默认为 3,由于 seaseach 都是每个资料库一个索引,为了提升加载效率,改为默认为 1
|
||||
|
||||
s3相关,仅在 ZINC_STORAGE_TYPE=s3 时生效
|
||||
ZINC_S3_ACCESS_ID
|
||||
ZINC_S3_USE_V4_SIGNATURE
|
||||
ZINC_S3_ACCESS_SECRET
|
||||
ZINC_S3_ENDPOINT
|
||||
ZINC_S3_USE_HTTPS
|
||||
ZINC_S3_PATH_STYLE_REQUEST
|
||||
ZINC_S3_AWS_REGION
|
||||
|
||||
oss相关,仅在 ZINC_STORAGE_TYPE=oss 时生效
|
||||
ZINC_OSS_ACCESS_ID
|
||||
ZINC_OSS_ACCESS_SECRET
|
||||
ZINC_OSS_BUCKET
|
||||
ZINC_OSS_ENDPOINT
|
||||
|
||||
集群相关
|
||||
ZINC_SERVER_MODE 默认 none 为单机部署,可选 cluster,集群时必须为 cluster
|
||||
ZINC_CLUSTER_ID 集群id,需要全局唯一
|
||||
ZINC_ETCD_ENDPOINTS etcd 地址
|
||||
ZINC_ETCD_ENDPOINTS etcd key前缀 默认 /zinc
|
||||
ZINC_ETCD_USERNAME etcd 用户名
|
||||
ZINC_ETCD_PASSWORD etcd 密码
|
||||
|
||||
日志相关
|
||||
ZINC_LOG_OUTPUT 是否将日志输出到文件,默认 是
|
||||
ZINC_LOG_DIR 日志目录,建议配置,默认为当前目录下的 log 子目录
|
||||
ZINC_LOG_LEVEL 日志级别,默认 debug
|
||||
|
||||
```
|
||||
|
||||
## proxy 配置
|
||||
|
||||
```
|
||||
ZINC_CLUSTER_PROXY_LOG_DIR=./log
|
||||
ZINC_CLUSTER_PROXY_HOST=0.0.0.0
|
||||
ZINC_CLUSTER_PROXY_PORT=4082
|
||||
ZINC_SERVER_MODE=proxy #必须为proxy
|
||||
ZINC_ETCD_ENDPOINTS=127.0.0.1:2379
|
||||
ZINC_ETCD_PREFIX=/zinc
|
||||
ZINC_MAX_DOCUMENT_SIZE=1m #bulk和multisearch 对单个最大document的限制,默认1m
|
||||
ZINC_CLUSTER_MANAGER_ADDR=127.0.0.1:4081 #manager 地址
|
||||
```
|
||||
|
||||
## cluster-manger 配置
|
||||
|
||||
```
|
||||
ZINC_CLUSTER_MANAGER_LOG_DIR=./log
|
||||
ZINC_CLUSTER_MANAGER_HOST=0.0.0.0
|
||||
ZINC_CLUSTER_MANAGER_PORT=4081
|
||||
ZINC_CLUSTER_MANAGER_ETCD_ENDPOINTS=127.0.0.1:2379
|
||||
ZINC_CLUSTER_MANAGER_ETCD_PREFIX=/zinc
|
||||
```
|
30
manual/deploy/README.md
Normal file
30
manual/deploy/README.md
Normal file
|
@ -0,0 +1,30 @@
|
|||
# 启动 SeaSearch
|
||||
|
||||
## 启动单机 SeaSearch
|
||||
|
||||
对于开发环境而言,只需要按照官方说明,配置 启动帐号和启动密码两个 环境变量即可。
|
||||
|
||||
编译 SeaSearch 参考: [Setup](../setup/README.md)
|
||||
|
||||
对于开发环境,直接配置环境变量,并启动二进制文件即可;
|
||||
|
||||
以下命令会首先创建一个 data文件夹,作为默认的存储路径,之后以 admin 以及 Complexpass#123作为初始用户,启动一个 SeaSearch 程序,并默认监听4080端口:
|
||||
|
||||
```
|
||||
mkdir data
|
||||
ZINC_FIRST_ADMIN_USER=admin ZINC_FIRST_ADMIN_PASSWORD=Complexpass#123 GIN_MODE=release ./SeaSearch
|
||||
```
|
||||
|
||||
如果需要重置数据,删除整个 data 目录再重启即可,这会清理所有元数据以及索引数据。
|
||||
|
||||
## 启动集群
|
||||
|
||||
# 集群部署
|
||||
|
||||
1. 启动 etcd
|
||||
|
||||
2. 启动 SeaSearch 节点,节点会自动向 etcd 注册心跳。
|
||||
|
||||
3. 启动 cluster-manager,然后通过 API 或者 直接向 etcd 设置 cluster-info,设置SeaSearch 节点的地址。并且同时,cluster-manager 开始根据节点心跳对分片进行分配。
|
||||
|
||||
4. 启动 SeaSearch-proxy,此时就可以对外提供服务了。
|
BIN
manual/media/favicon.ico
Normal file
BIN
manual/media/favicon.ico
Normal file
Binary file not shown.
After Width: | Height: | Size: 4.2 KiB |
BIN
manual/media/seafile-transparent-1024.png
Normal file
BIN
manual/media/seafile-transparent-1024.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 46 KiB |
247
manual/setup/README.md
Normal file
247
manual/setup/README.md
Normal file
|
@ -0,0 +1,247 @@
|
|||
# 安装 SeaSearch
|
||||
|
||||
原版的 SeaSearch 采用纯 go 语言编写,直接通过 Go 编译工具即可编译。在我们引入向量检索功能时,用到了 faiss 库,这个库需要以 CGO 的方式调用,所以对 SeaSearch 的编译会产生影响。
|
||||
|
||||
## 安装 faiss
|
||||
|
||||
要在一台机器上编译或者运行 SeaSearch,需要这台机器安装 faiss 库。下面是具体安装步骤,适用于 x86 linux 机器,流程采用的操作系统为 debian 12,使用 apt 作为包管理器
|
||||
|
||||
### 前提条件
|
||||
|
||||
通过包管理器安装,如果连接速度慢,可以尝试更换源
|
||||
|
||||
ubuntu 参考:[https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/](https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/)
|
||||
|
||||
debian 参考:[https://mirrors.tuna.tsinghua.edu.cn/help/debian/](https://mirrors.tuna.tsinghua.edu.cn/help/debian/)
|
||||
|
||||
换源之后,执行
|
||||
|
||||
```
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
C++ 编译器,支持C++17及以上
|
||||
|
||||
可以通过 apt 安装
|
||||
|
||||
```
|
||||
sudo apt install -y gcc
|
||||
```
|
||||
|
||||
Cmake,3.23.1 以上,如果源不是最新,可以从 ppa 或者源码安装
|
||||
|
||||
```
|
||||
sudo apt install -y cmake
|
||||
```
|
||||
|
||||
wget swig gnupg libomp
|
||||
|
||||
```
|
||||
sudo apt install -y wget swig gnupg libomp-dev
|
||||
```
|
||||
|
||||
nodeJs;
|
||||
|
||||
```
|
||||
sudo apt-get update && sudo apt-get install -y ca-certificates curl gnupg
|
||||
curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg
|
||||
NODE_MAJOR=20
|
||||
echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | sudo tee /etc/apt/sources.list.d/nodesource.list
|
||||
sudo apt update && sudo apt install nodejs -y
|
||||
```
|
||||
|
||||
### 安装 Intel MKL库 (可选,仅支持x86 cpu)
|
||||
|
||||
faiss 依赖 BLAS,并且推荐使用 intel MKL性能最佳
|
||||
|
||||
```
|
||||
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \
|
||||
| gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
|
||||
|
||||
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" |sudo tee /etc/apt/sources.list.d/oneAPI.list
|
||||
|
||||
sudo apt update
|
||||
|
||||
sudo apt install -y intel-oneapi-mkl-devel
|
||||
```
|
||||
|
||||
执行完毕之后,MKL库就安装完毕了,再配置一个环境变量:
|
||||
|
||||
```
|
||||
export MKL_PATH=/opt/intel/oneapi/mkl/latest/lib/intel64
|
||||
```
|
||||
|
||||
### 非 x86 cpu 安装BLAS
|
||||
|
||||
非x86 cpu无法安装 MKL,可安装 OpenBLAS 实现
|
||||
|
||||
```
|
||||
sudo apt install -y libatlas-base-dev libatlas3-base
|
||||
```
|
||||
|
||||
### 编译 faiss
|
||||
|
||||
下载 faiss 源码,ssh方式:
|
||||
|
||||
```
|
||||
git clone git@github.com:facebookresearch/faiss.git
|
||||
```
|
||||
|
||||
或者 http方式:
|
||||
|
||||
```
|
||||
git clone https://github.com/facebookresearch/faiss.git
|
||||
```
|
||||
|
||||
进入 faiss 目录,如果安装了 MKL,执行:
|
||||
|
||||
```
|
||||
cmake -B build -DFAISS_ENABLE_GPU=OFF \
|
||||
-DFAISS_ENABLE_C_API=ON \
|
||||
-DFAISS_ENABLE_PYTHON=OFF \
|
||||
-DBLA_VENDOR=Intel10_64_dyn \
|
||||
-DBUILD_SHARED_LIBS=ON \
|
||||
"-DMKL_LIBRARIES=-Wl,--start-group;${MKL_PATH}/libmkl_intel_lp64.a;${MKL_PATH}/libmkl_gnu_thread.a;${MKL_PATH}/libmkl_core.a;-Wl,--end-group" \
|
||||
.
|
||||
```
|
||||
|
||||
如果未安装 MKL,执行:
|
||||
|
||||
```
|
||||
cmake -B build -DFAISS_ENABLE_GPU=OFF \
|
||||
-DFAISS_ENABLE_C_API=ON \
|
||||
-DFAISS_ENABLE_PYTHON=OFF \
|
||||
-DBUILD_SHARED_LIBS=ON=ON \
|
||||
-DBUILD_TESTING=OFF \
|
||||
.
|
||||
```
|
||||
|
||||
执行编译
|
||||
|
||||
```
|
||||
make -C build
|
||||
```
|
||||
|
||||
安装头文件:
|
||||
|
||||
```
|
||||
sudo make -C build install
|
||||
```
|
||||
|
||||
将编译好的动态链接库,拷贝到系统路径,这里的 /tmp/faiss 是faiss源码路径,替换为真正的路径即可:
|
||||
|
||||
```
|
||||
sudo cp /tmp/faiss/build/c_api/libfaiss_c.so /usr/lib
|
||||
```
|
||||
|
||||
完整安装脚本可以参考 SeaSearch 项目目录下的 /ci/install\_faiss.sh
|
||||
|
||||
## 编译 SeaSearch
|
||||
|
||||
faiss 已经安装完毕,可以开始编译 SeaSearch了
|
||||
|
||||
首先下载 SeaSearch源码:
|
||||
|
||||
```
|
||||
git clone git@github.com:seafileltd/seasearch.git
|
||||
```
|
||||
|
||||
或者 http方式:
|
||||
|
||||
```
|
||||
git clone https://github.com/seafileltd/seasearch.git
|
||||
```
|
||||
|
||||
编译前端静态文件
|
||||
|
||||
```
|
||||
cd web
|
||||
npm config set registry https://registry.npmmirror.com
|
||||
npm install
|
||||
npm run build
|
||||
```
|
||||
|
||||
安装 go 语言环境 Go 1.20 以上
|
||||
|
||||
参考 [https://go.dev/doc/install](https://go.dev/doc/install)
|
||||
|
||||
需要确保启用了 CGO
|
||||
|
||||
```
|
||||
export CGO_ENABLED=1
|
||||
```
|
||||
|
||||
可选,更换 go 源:
|
||||
|
||||
```
|
||||
go env -w GOPROXY=https://goproxy.cn,direct
|
||||
```
|
||||
|
||||
之后在项目根目录执行:
|
||||
|
||||
```
|
||||
go build -o seasearch ./cmd/zincsearch/
|
||||
```
|
||||
|
||||
以上步骤执行完毕,可以在项目的根目录下面得到最终的 seasearch 二进制文件了。
|
||||
|
||||
一般来说无需手动指定头文件和动态链接库位置,如果编译提示找不到头文件,或者找不到动态运行库,可以在编译时通过环境变量指定位置:
|
||||
|
||||
```
|
||||
CGO_CFLAGS=-I /usr/local/include #你的C
|
||||
CGO_LDFLAGS=-I /usr/lib
|
||||
```
|
||||
|
||||
如果运行时,提示找不到 动态链接库,可以通过:
|
||||
|
||||
```
|
||||
LD_LIBRARY_PATH=/usr/lib #指定动态链接库目录
|
||||
```
|
||||
|
||||
## 编译 seasearch proxy 和 cluster manger
|
||||
|
||||
在集群下,需要编译部署 seasearch proxy 和 cluster manager
|
||||
|
||||
编译 proxy:
|
||||
|
||||
```
|
||||
go build -o seasearch-proxy ./cmd/zinc-proxy/main.go
|
||||
```
|
||||
|
||||
编译 cluster manager:
|
||||
|
||||
```
|
||||
go build -o cluster-manager ./cmd/cluster-manager/main.go
|
||||
```
|
||||
|
||||
|
||||
## 发布
|
||||
|
||||
项目根目录下有 Dokcerfile 文件,可以根据此文件构建 docker 镜像
|
||||
|
||||
注意:构建此 docker 镜像,需要确保能正常访问 github,否则无法下载 faiss 源码会导致构建失败, 并且仅支持 x86 cpu,arm 需要设置 platform 参数模拟 x86
|
||||
|
||||
```
|
||||
docker build -f ./Dockerfile .
|
||||
```
|
||||
|
||||
## Mac 中存在的安装问题
|
||||
|
||||
### faiss 安装
|
||||
|
||||
faiss 可通过 brew install faiss 安装
|
||||
|
||||
### fatal error: 'faiss/c\_api/AutoTune\_c.h' file not found
|
||||
|
||||
执行如下命令解决:
|
||||
|
||||
source: [https://github.com/DataIntelligenceCrew/go-faiss/issues/7](https://github.com/DataIntelligenceCrew/go-faiss/issues/7)
|
||||
|
||||
```
|
||||
cd faiss
|
||||
export CMAKE_PREFIX_PATH=/opt/homebrew/opt/openblas:/opt/homebrew/opt/libomp:/opt/homebrew
|
||||
cmake -B build -DFAISS_ENABLE_GPU=OFF -DFAISS_ENABLE_C_API=ON -DBUILD_SHARED_LIBS=ON -DFAISS_ENABLE_PYTHON=OFF .
|
||||
make -C build
|
||||
sudo make -C build install
|
||||
sudo cp build/c_api/libfaiss_c.dylib /usr/local/lib/libfaiss_c.dylib
|
||||
```
|
110
manual/setup/compile_seasearch.md
Normal file
110
manual/setup/compile_seasearch.md
Normal file
|
@ -0,0 +1,110 @@
|
|||
|
||||
# 编译 SeaSearch
|
||||
|
||||
faiss 已经安装完毕,可以开始编译 SeaSearch了
|
||||
|
||||
首先下载 SeaSearch源码:
|
||||
|
||||
```
|
||||
git clone git@github.com:seafileltd/seasearch.git
|
||||
```
|
||||
|
||||
或者 http方式:
|
||||
|
||||
```
|
||||
git clone https://github.com/seafileltd/seasearch.git
|
||||
```
|
||||
|
||||
编译前端静态文件
|
||||
|
||||
```
|
||||
cd web
|
||||
npm config set registry https://registry.npmmirror.com
|
||||
npm install
|
||||
npm run build
|
||||
```
|
||||
|
||||
安装 go 语言环境 Go 1.20 以上
|
||||
|
||||
参考 [https://go.dev/doc/install](https://go.dev/doc/install)
|
||||
|
||||
需要确保启用了 CGO
|
||||
|
||||
```
|
||||
export CGO_ENABLED=1
|
||||
```
|
||||
|
||||
可选,更换 go 源:
|
||||
|
||||
```
|
||||
go env -w GOPROXY=https://goproxy.cn,direct
|
||||
```
|
||||
|
||||
之后在项目根目录执行:
|
||||
|
||||
```
|
||||
go build -o seasearch ./cmd/zincsearch/
|
||||
```
|
||||
|
||||
以上步骤执行完毕,可以在项目的根目录下面得到最终的 seasearch 二进制文件了。
|
||||
|
||||
一般来说无需手动指定头文件和动态链接库位置,如果编译提示找不到头文件,或者找不到动态运行库,可以在编译时通过环境变量指定位置:
|
||||
|
||||
```
|
||||
CGO_CFLAGS=-I /usr/local/include #你的C
|
||||
CGO_LDFLAGS=-I /usr/lib
|
||||
```
|
||||
|
||||
如果运行时,提示找不到 动态链接库,可以通过:
|
||||
|
||||
```
|
||||
LD_LIBRARY_PATH=/usr/lib #指定动态链接库目录
|
||||
```
|
||||
|
||||
# 编译 seasearch proxy 和 cluster manger
|
||||
|
||||
在集群下,需要编译部署 seasearch proxy 和 cluster manager
|
||||
|
||||
编译 proxy:
|
||||
|
||||
```
|
||||
go build -o seasearch-proxy ./cmd/zinc-proxy/main.go
|
||||
```
|
||||
|
||||
编译 cluster manager:
|
||||
|
||||
```
|
||||
go build -o cluster-manager ./cmd/cluster-manager/main.go
|
||||
```
|
||||
|
||||
|
||||
# 发布
|
||||
|
||||
项目根目录下有 Dokcerfile 文件,可以根据此文件构建 docker 镜像
|
||||
|
||||
注意:构建此 docker 镜像,需要确保能正常访问 github,否则无法下载 faiss 源码会导致构建失败, 并且仅支持 x86 cpu,arm 需要设置 platform 参数模拟 x86
|
||||
|
||||
```
|
||||
docker build -f ./Dockerfile .
|
||||
```
|
||||
|
||||
# Mac 中存在的安装问题
|
||||
|
||||
## faiss 安装
|
||||
|
||||
faiss 可通过 brew install faiss 安装
|
||||
|
||||
## fatal error: 'faiss/c\_api/AutoTune\_c.h' file not found
|
||||
|
||||
执行如下命令解决:
|
||||
|
||||
source: [https://github.com/DataIntelligenceCrew/go-faiss/issues/7](https://github.com/DataIntelligenceCrew/go-faiss/issues/7)
|
||||
|
||||
```
|
||||
cd faiss
|
||||
export CMAKE_PREFIX_PATH=/opt/homebrew/opt/openblas:/opt/homebrew/opt/libomp:/opt/homebrew
|
||||
cmake -B build -DFAISS_ENABLE_GPU=OFF -DFAISS_ENABLE_C_API=ON -DBUILD_SHARED_LIBS=ON -DFAISS_ENABLE_PYTHON=OFF .
|
||||
make -C build
|
||||
sudo make -C build install
|
||||
sudo cp build/c_api/libfaiss_c.dylib /usr/local/lib/libfaiss_c.dylib
|
||||
```
|
133
manual/setup/install_faiss.md
Normal file
133
manual/setup/install_faiss.md
Normal file
|
@ -0,0 +1,133 @@
|
|||
# 安装 faiss
|
||||
|
||||
要在一台机器上编译或者运行 SeaSearch,需要这台机器安装 faiss 库。下面是具体安装步骤,适用于 x86 linux 机器,流程采用的操作系统为 debian 12,使用 apt 作为包管理器
|
||||
|
||||
## 前提条件
|
||||
|
||||
通过包管理器安装,如果连接速度慢,可以尝试更换源
|
||||
|
||||
ubuntu 参考:[https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/](https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/)
|
||||
|
||||
debian 参考:[https://mirrors.tuna.tsinghua.edu.cn/help/debian/](https://mirrors.tuna.tsinghua.edu.cn/help/debian/)
|
||||
|
||||
换源之后,执行
|
||||
|
||||
```
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
C++ 编译器,支持C++17及以上
|
||||
|
||||
可以通过 apt 安装
|
||||
|
||||
```
|
||||
sudo apt install -y gcc
|
||||
```
|
||||
|
||||
Cmake,3.23.1 以上,如果源不是最新,可以从 ppa 或者源码安装
|
||||
|
||||
```
|
||||
sudo apt install -y cmake
|
||||
```
|
||||
|
||||
wget swig gnupg libomp
|
||||
|
||||
```
|
||||
sudo apt install -y wget swig gnupg libomp-dev
|
||||
```
|
||||
|
||||
nodeJs;
|
||||
|
||||
```
|
||||
sudo apt-get update && sudo apt-get install -y ca-certificates curl gnupg
|
||||
curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg
|
||||
NODE_MAJOR=20
|
||||
echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | sudo tee /etc/apt/sources.list.d/nodesource.list
|
||||
sudo apt update && sudo apt install nodejs -y
|
||||
```
|
||||
|
||||
## 安装 Intel MKL库 (可选,仅支持x86 cpu)
|
||||
|
||||
faiss 依赖 BLAS,并且推荐使用 intel MKL性能最佳
|
||||
|
||||
```
|
||||
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \
|
||||
| gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
|
||||
|
||||
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" |sudo tee /etc/apt/sources.list.d/oneAPI.list
|
||||
|
||||
sudo apt update
|
||||
|
||||
sudo apt install -y intel-oneapi-mkl-devel
|
||||
```
|
||||
|
||||
执行完毕之后,MKL库就安装完毕了,再配置一个环境变量:
|
||||
|
||||
```
|
||||
export MKL_PATH=/opt/intel/oneapi/mkl/latest/lib/intel64
|
||||
```
|
||||
|
||||
## 非 x86 cpu 安装BLAS
|
||||
|
||||
非x86 cpu无法安装 MKL,可安装 OpenBLAS 实现
|
||||
|
||||
```
|
||||
sudo apt install -y libatlas-base-dev libatlas3-base
|
||||
```
|
||||
|
||||
## 编译 faiss
|
||||
|
||||
下载 faiss 源码,ssh方式:
|
||||
|
||||
```
|
||||
git clone git@github.com:facebookresearch/faiss.git
|
||||
```
|
||||
|
||||
或者 http方式:
|
||||
|
||||
```
|
||||
git clone https://github.com/facebookresearch/faiss.git
|
||||
```
|
||||
|
||||
进入 faiss 目录,如果安装了 MKL,执行:
|
||||
|
||||
```
|
||||
cmake -B build -DFAISS_ENABLE_GPU=OFF \
|
||||
-DFAISS_ENABLE_C_API=ON \
|
||||
-DFAISS_ENABLE_PYTHON=OFF \
|
||||
-DBLA_VENDOR=Intel10_64_dyn \
|
||||
-DBUILD_SHARED_LIBS=ON \
|
||||
"-DMKL_LIBRARIES=-Wl,--start-group;${MKL_PATH}/libmkl_intel_lp64.a;${MKL_PATH}/libmkl_gnu_thread.a;${MKL_PATH}/libmkl_core.a;-Wl,--end-group" \
|
||||
.
|
||||
```
|
||||
|
||||
如果未安装 MKL,执行:
|
||||
|
||||
```
|
||||
cmake -B build -DFAISS_ENABLE_GPU=OFF \
|
||||
-DFAISS_ENABLE_C_API=ON \
|
||||
-DFAISS_ENABLE_PYTHON=OFF \
|
||||
-DBUILD_SHARED_LIBS=ON=ON \
|
||||
-DBUILD_TESTING=OFF \
|
||||
.
|
||||
```
|
||||
|
||||
执行编译
|
||||
|
||||
```
|
||||
make -C build
|
||||
```
|
||||
|
||||
安装头文件:
|
||||
|
||||
```
|
||||
sudo make -C build install
|
||||
```
|
||||
|
||||
将编译好的动态链接库,拷贝到系统路径,这里的 /tmp/faiss 是faiss源码路径,替换为真正的路径即可:
|
||||
|
||||
```
|
||||
sudo cp /tmp/faiss/build/c_api/libfaiss_c.so /usr/lib
|
||||
```
|
||||
|
||||
完整安装脚本可以参考 SeaSearch 项目目录下的 /ci/install\_faiss.sh
|
53
mkdocs.yml
Normal file
53
mkdocs.yml
Normal file
|
@ -0,0 +1,53 @@
|
|||
site_name: SeaSearch Manual
|
||||
site_author: seafile
|
||||
docs_dir: ./manual
|
||||
site_url: https://haiwen.github.io/seasearch-docs/
|
||||
|
||||
repo_name: haiwen/seasearch-docs
|
||||
repo_url: https://github.com/haiwen/seasearch-docs/
|
||||
edit_uri: blob/master/manual
|
||||
|
||||
copyright: Copyright © 2023 Seafile Ltd.
|
||||
|
||||
theme:
|
||||
name: material
|
||||
logo: media/seafile-transparent-1024.png
|
||||
favicon: media/favicon.ico
|
||||
palette:
|
||||
primary: white
|
||||
accent:
|
||||
|
||||
plugins:
|
||||
- search
|
||||
- awesome-pages
|
||||
|
||||
# Customization
|
||||
extra:
|
||||
social:
|
||||
- icon: fontawesome/brands/github
|
||||
link: https://github.com/haiwen/seasearch-docs/
|
||||
|
||||
extra_css:
|
||||
- stylesheets/extra.css
|
||||
|
||||
# Extensions
|
||||
markdown_extensions:
|
||||
- markdown.extensions.admonition
|
||||
- markdown.extensions.attr_list
|
||||
- markdown.extensions.codehilite:
|
||||
guess_lang: true
|
||||
- markdown.extensions.def_list
|
||||
- markdown.extensions.footnotes
|
||||
- markdown.extensions.meta
|
||||
- markdown.extensions.toc:
|
||||
permalink: true
|
||||
toc_depth: "1-4"
|
||||
|
||||
# Page tree
|
||||
nav:
|
||||
- Setup:
|
||||
- Installation of SeaSearch: setup/README.md
|
||||
- Deploy:
|
||||
- Deploy SeaSearch: deploy/README.md
|
||||
- Configuration: config/README.md
|
||||
- SeaSearch API: api/seasearch_api.md
|
Loading…
Add table
Add a link
Reference in a new issue