4.5 KiB
4.5 KiB
Maintaining Datasets via API
Authentication and invocation methods are consistent with the application Service API. The difference is that a single dataset API token can operate on all datasets.
Advantages of Using Dataset API
- Synchronize your data system with Dify datasets to create powerful workflows.
- Provide dataset list, document list, and detail queries to facilitate building your own data management page.
- Support both plain text and file uploads and updates for documents, and support batch addition and modification at the segment level to streamline your synchronization process.
- Reduce the time spent on manual document processing and synchronization, enhancing your visibility into Dify's software and services.
How to Use
Navigate to the dataset page, and you can switch to the API page from the left navigation. On this page, you can view the dataset API documentation provided by Dify and manage the credentials for accessing the dataset API in API Keys.

Knowledge API Document
API Call Examples
Create an Empty Dataset
{% hint style="warning" %} Only used to create an empty dataset {% endhint %}
curl --location --request POST 'https://api.dify.ai/v1/datasets' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{"name": "name"}'
Dataset List
curl --location --request GET 'https://api.dify.ai/v1/datasets?page=1&limit=20' \
--header 'Authorization: Bearer {api_key}'
Create Document by Text
curl --location --request POST 'https://api.dify.ai/v1/datasets/<uuid:dataset_id>/document/create_by_text' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "Dify",
"text": "Dify means Do it for you...",
"indexing_technique": "high_quality",
"process_rule": {
"rules": {
"pre_processing_rules": [{
"id": "remove_extra_spaces",
"enabled": true
}, {
"id": "remove_urls_emails",
"enabled": true
}],
"segmentation": {
"separator": "###",
"max_tokens": 500
}
},
"mode": "custom"
}
}'
Create Document by File
curl --location POST 'https://api.dify.ai/v1/datasets/{dataset_id}/document/create_by_file' \
--header 'Authorization: Bearer {api_key}' \
--form 'data="{
"name": "Dify",
"indexing_technique": "high_quality",
"process_rule": {
"rules": {
"pre_processing_rules": [{
"id": "remove_extra_spaces",
"enabled": true
}, {
"id": "remove_urls_emails",
"enabled": true
}],
"segmentation": {
"separator": "###",
"max_tokens": 500
}
},
"mode": "custom"
}
}";
type=text/plain' \
--form 'file=@"/path/to/file"'
Get Document Embedding Status (Progress)
curl --location --request GET 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{batch}/indexing-status' \
--header 'Authorization: Bearer {api_key}'
Delete Document
curl --location --request DELETE 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{document_id}' \
--header 'Authorization: Bearer {api_key}'
Dataset Document List
curl --location --request GET 'https://api.dify.ai/v1/datasets/{dataset_id}/documents' \
--header 'Authorization: Bearer {api_key}'
Add Segments
curl 'https://api.dify.ai/v1/datasets/aac47674-31a8-4f12-aab2-9603964c4789/documents/2034e0c1-1b75-4532-849e-24e72666595b/segment' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw $'"chunks":[
{"content":"Dify means Do it for you",
"keywords":["Dify","Do"]
}
]'
--compressed
Error Messages
document_indexing: Document indexing failedprovider_not_initialize: Embedding model not configurednot_found: Document not founddataset_name_duplicate: Dataset name duplicateprovider_quota_exceeded: Model quota exceededdataset_not_initialized: Dataset not initializedunsupported_file_type: Unsupported file type- Currently supported: txt, markdown, md, pdf, html, htm, xlsx, docx, csv
too_many_files: Too many files, currently only single file uploads are supportedfile_too_large: File too large, supports files under 15MB