add dataset api doc

pull/22/head
jyong 2023-10-10 17:19:09 +08:00
parent d290a34650
commit 3200020df9
6 changed files with 309 additions and 1 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 292 KiB

View File

@ -40,6 +40,7 @@
* [Datasets\&Index](advanced/datasets/README.md)
* [Sync from Notion](advanced/datasets/sync-from-notion.md)
* [Maintain Datasets Via Api](advanced/datasets/maintain-dataset-via-api.md)
* [Plugins](advanced/ai-plugins.md)
* [Based on WebApp Template](advanced/based-on-frontend-templates.md)
* [Model Configuration](advanced/model-configuration/README.md)

View File

@ -0,0 +1,152 @@
# Maintain Datasets via API
> Authentication, invocation method and application Service API remain consistent. The difference is that a dataset API token can operate on all datasets.
### Benefits of Using the Dataset API
* Sync your data systems to Dify datasets to create powerful workflows.
* Provide dataset list and document list APIs as well as detail query interfaces, to facilitate building your own data management page.
* Support both plain text and file uploads/updates documents, as well as batch additions and modifications, to simplify your sync process.
* Reduce manual document handling and syncing time, improving visibility of Dify's software and services.
### How to use
Please go to the dataset page, you can switch tap to the API page in the navigation on the left side. On this page, you can view the API documentation provided by Dify and manage credentials for accessing the Dataset API.
<figure><img src="../../.gitbook/assets/dataset-api-token.png" alt=""><figcaption><p>Dataset API Document</p></figcaption></figure>
## **Create Empty Dataset**
**`POST /datasets`**
{% hint style="warning" %}
Used only to create an empty dataset
{% endhint %}
```
curl --location --request POST 'https://api.dify.ai/v1/datasets' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{"name": "name"}'
```
#### **List of Datasets**
```
curl --location --request GET 'https://api.dify.ai/v1/datasets?page=1&limit=20' \
--header 'Authorization: Bearer {api_key}'
```
#### **Create A Document From Text**
```
curl --location --request POST '<https://api.dify.ai/v1/datasets/<uuid:dataset_id>/document/create_by_text>' \\
--header 'Authorization: Bearer {api_key}' \\
--header 'Content-Type: application/json' \\
--data-raw '{
"name": "Dify",
"text": "Dify means Do it for you...",
"indexing_technique": "high_quality",
"process_rule": {
"rules": {
"pre_processing_rules": [{
"id": "remove_extra_spaces",
"enabled": true
}, {
"id": "remove_urls_emails",
"enabled": true
}],
"segmentation": {
"separator": "###",
"max_tokens": 500
}
},
"mode": "custom"
}
}'
```
#### **Create A Document From File**
```
curl --location POST 'https://api.dify.ai/v1/datasets/{dataset_id}/document/create_by_file' \
--header 'Authorization: Bearer {api_key}' \
--form 'data="{
"name": "Dify",
"indexing_technique": "high_quality",
"process_rule": {
"rules": {
"pre_processing_rules": [{
"id": "remove_extra_spaces",
"enabled": true
}, {
"id": "remove_urls_emails",
"enabled": true
}],
"segmentation": {
"separator": "###",
"max_tokens": 500
}
},
"mode": "custom"
}
}";
type=text/plain' \
--form 'file=@"/path/to/file"'
```
#### **Get Document Embedding Status**
```
curl --location --request GET 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{batch}/indexing-status' \
--header 'Authorization: Bearer {api_key}'
```
#### **Delete Document**
```
curl --location --request DELETE 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{document_id}' \
--header 'Authorization: Bearer {api_key}'
```
#### **Get Document List**
```
curl --location --request GET 'https://api.dify.ai/v1/datasets/{dataset_id}/documents' \
--header 'Authorization: Bearer {api_key}'
```
#### **Add New Segment**
```
curl 'https://api.dify.ai/v1/datasets/aac47674-31a8-4f12-aab2-9603964c4789/documents/2034e0c1-1b75-4532-849e-24e72666595b/segment' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw $'"segments":[
{"content":"Dify means Do it for you",
"keywords":["Dify","Do"]
}
]'
--compressed
```
### Error Message
- `document_indexing`document is in indexing status
- `provider_not_initialize` Embedding model is not configured
- `not_found`document not exist
- `dataset_name_duplicate` have existing dataset name
- `provider_quota_exceeded`The model quota has exceeded the limit
- `dataset_not_initialized`The dataset has not been initialized
- `unsupported_file_type`Unsupported file type
- support file typetxt, markdown, md, pdf, html, htm, xlsx, docx, csv
- `too_many_files`The number of files is too large, and only single file upload is temporarily supported
- `file_too_large`The file is too large, supporting files under 15M

Binary file not shown.

After

Width:  |  Height:  |  Size: 292 KiB

View File

@ -1,4 +1,4 @@
# Table of contents
# Table of contents
## 入门 <a href="#getting-started" id="getting-started"></a>

View File

@ -0,0 +1,155 @@
# 通过 API 维护数据集
> 鉴权、调用方式与应用 Service API 保持一致,不同的是一个数据集 API token 可操作所有数据集
### 使用数据集API的优势
* 将您的数据系统同步至 Dify 数据集,创建强大的工作流程。
* 提供数据集列表,文档列表及详情查询,方便构建您自己的数据管理页。
* 同时支持纯文本和文件两种上传和更新文档的接口,并支持分段级的批量新增和修改,便捷您的同步方式。
* 减少文档手动处理同步的时间,提高您对 Dify 的软件和服务的可见性。
### 如何使用
进入数据集页面,你可以在左侧的导航中切换至 **API** 页面. 在该页面中你可以查看 Dify 提供的 数据集 API 文档,并可以在 **API 秘钥** 中管理可访问数据集 API 的凭据。
<figure><img src="../../.gitbook/assets/dataset-api-token.png" alt=""><figcaption><p>Dataset API Document</p></figcaption></figure>
### API 调用示例
#### **创建空数据集**
{% hint style="warning" %}
仅用来创建空数据集
{% endhint %}
```
curl --location --request POST 'https://api.dify.ai/v1/datasets' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{"name": "name"}'
```
#### **数据集列表**
```
curl --location --request GET 'https://api.dify.ai/v1/datasets?page=1&limit=20' \
--header 'Authorization: Bearer {api_key}'
```
#### **通过文本创建文档**
```
curl --location --request POST '<https://api.dify.ai/v1/datasets/<uuid:dataset_id>/document/create_by_text>' \\
--header 'Authorization: Bearer {api_key}' \\
--header 'Content-Type: application/json' \\
--data-raw '{
"name": "Dify",
"text": "Dify means Do it for you...",
"indexing_technique": "high_quality",
"process_rule": {
"rules": {
"pre_processing_rules": [{
"id": "remove_extra_spaces",
"enabled": true
}, {
"id": "remove_urls_emails",
"enabled": true
}],
"segmentation": {
"separator": "###",
"max_tokens": 500
}
},
"mode": "custom"
}
}'
```
#### **通过文件创建文档**
```
curl --location POST 'https://api.dify.ai/v1/datasets/{dataset_id}/document/create_by_file' \
--header 'Authorization: Bearer {api_key}' \
--form 'data="{
"name": "Dify",
"indexing_technique": "high_quality",
"process_rule": {
"rules": {
"pre_processing_rules": [{
"id": "remove_extra_spaces",
"enabled": true
}, {
"id": "remove_urls_emails",
"enabled": true
}],
"segmentation": {
"separator": "###",
"max_tokens": 500
}
},
"mode": "custom"
}
}";
type=text/plain' \
--form 'file=@"/path/to/file"'
```
#### **获取文档嵌入状态(进度)**
```
curl --location --request GET 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{batch}/indexing-status' \
--header 'Authorization: Bearer {api_key}'
```
#### **删除文档**
```
curl --location --request DELETE 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{document_id}' \
--header 'Authorization: Bearer {api_key}'
```
#### **数据集文档列表**
```
curl --location --request GET 'https://api.dify.ai/v1/datasets/{dataset_id}/documents' \
--header 'Authorization: Bearer {api_key}'
```
#### **新增分段**
```
curl 'https://api.dify.ai/v1/datasets/aac47674-31a8-4f12-aab2-9603964c4789/documents/2034e0c1-1b75-4532-849e-24e72666595b/segment' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw $'"segments":[
{"content":"Dify means Do it for you",
"keywords":["Dify","Do"]
}
]'
--compressed
```
### 错误信息
- `document_indexing`,文档索引失败
- `provider_not_initialize` Embedding 模型未配置
- `not_found`,文档不存在
- `dataset_name_duplicate` ,数据集名称重复
- `provider_quota_exceeded`,模型额度超过限制
- `dataset_not_initialized`,数据集还未初始化
- `unsupported_file_type`,不支持的文件类型
- 目前只支持txt, markdown, md, pdf, html, htm, xlsx, docx, csv
- `too_many_files`,文件数量过多,暂时只支持单一文件上传
- `file_too_large`文件太大支持15M以下