4.5 KiB

Raw Permalink Blame History

Maintaining Datasets via API

Authentication and invocation methods are consistent with the application Service API. The difference is that a single dataset API token can operate on all datasets.

Advantages of Using Dataset API

Synchronize your data system with Dify datasets to create powerful workflows.
Provide dataset list, document list, and detail queries to facilitate building your own data management page.
Support both plain text and file uploads and updates for documents, and support batch addition and modification at the segment level to streamline your synchronization process.
Reduce the time spent on manual document processing and synchronization, enhancing your visibility into Dify's software and services.

How to Use

Navigate to the dataset page, and you can switch to the API page from the left navigation. On this page, you can view the dataset API documentation provided by Dify and manage the credentials for accessing the dataset API in API Keys.

API Call Examples

Create an Empty Dataset

{% hint style="warning" %} Only used to create an empty dataset {% endhint %}

curl --location --request POST 'https://api.dify.ai/v1/datasets' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{"name": "name"}'

Dataset List

curl --location --request GET 'https://api.dify.ai/v1/datasets?page=1&limit=20' \
--header 'Authorization: Bearer {api_key}'

Create Document by Text

curl --location --request POST 'https://api.dify.ai/v1/datasets/<uuid:dataset_id>/document/create_by_text' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
    "name": "Dify",
    "text": "Dify means Do it for you...",
    "indexing_technique": "high_quality",
    "process_rule": {
        "rules": {
                "pre_processing_rules": [{
                        "id": "remove_extra_spaces",
                        "enabled": true
                }, {
                        "id": "remove_urls_emails",
                        "enabled": true
                }],
                "segmentation": {
                        "separator": "###",
                        "max_tokens": 500
                }
        },
        "mode": "custom"
    }
}'

Create Document by File

curl --location POST 'https://api.dify.ai/v1/datasets/{dataset_id}/document/create_by_file' \
--header 'Authorization: Bearer {api_key}' \
--form 'data="{
	"name": "Dify",
	"indexing_technique": "high_quality",
	"process_rule": {
		"rules": {
			"pre_processing_rules": [{
				"id": "remove_extra_spaces",
				"enabled": true
			}, {
				"id": "remove_urls_emails",
				"enabled": true
			}],
			"segmentation": {
				"separator": "###",
				"max_tokens": 500
			}
		},
		"mode": "custom"
	}
    }";
    type=text/plain' \
--form 'file=@"/path/to/file"'

Get Document Embedding Status (Progress)

curl --location --request GET 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{batch}/indexing-status' \
--header 'Authorization: Bearer {api_key}'

Delete Document

curl --location --request DELETE 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{document_id}' \
--header 'Authorization: Bearer {api_key}'

Dataset Document List

curl --location --request GET 'https://api.dify.ai/v1/datasets/{dataset_id}/documents' \
--header 'Authorization: Bearer {api_key}'

Add Segments

curl 'https://api.dify.ai/v1/datasets/aac47674-31a8-4f12-aab2-9603964c4789/documents/2034e0c1-1b75-4532-849e-24e72666595b/segment' \
  --header 'Authorization: Bearer {api_key}' \
  --header 'Content-Type: application/json' \
  --data-raw $'"chunks":[
  {"content":"Dify means Do it for you",
  "keywords":["Dify","Do"]
  }
  ]'
  --compressed

Error Messages

document_indexing: Document indexing failed
provider_not_initialize: Embedding model not configured
not_found: Document not found
dataset_name_duplicate: Dataset name duplicate
provider_quota_exceeded: Model quota exceeded
dataset_not_initialized: Dataset not initialized
unsupported_file_type: Unsupported file type
- Currently supported: txt, markdown, md, pdf, html, htm, xlsx, docx, csv
too_many_files: Too many files, currently only single file uploads are supported
file_too_large: File too large, supports files under 15MB

4.5 KiB Raw Permalink Blame History