GITBOOK-4: 英文文档更新

feat/huggingface-embedding-support
chenxiaosha 2023-09-21 02:29:09 +00:00 committed by gitbook-bot
parent d0ffda7590
commit de3647ca64
No known key found for this signature in database
GPG Key ID: 07D2180C7B12D0FF
17 changed files with 30 additions and 29 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 88 KiB

After

Width:  |  Height:  |  Size: 728 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 296 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 296 KiB

After

Width:  |  Height:  |  Size: 284 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 284 KiB

After

Width:  |  Height:  |  Size: 744 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 420 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 420 KiB

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 70 KiB

After

Width:  |  Height:  |  Size: 783 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 106 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 106 KiB

After

Width:  |  Height:  |  Size: 193 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 193 KiB

After

Width:  |  Height:  |  Size: 664 KiB

View File

@ -4,7 +4,7 @@ Most language models use outdated training data and have length limitations for
Dify' dataset feature allows developers (and even non-technical users) to easily manage datasets and automatically integrate them into AI applications. All you need to do is prepare text content, such as:
* Long text content (TXT, Markdown, JSONL, or even PDF files)
* Long text content (TXT, Markdown, DOCX, HTML, JSONL, or even PDF files)
* Structured data (CSV, Excel, etc.)
Additionally, we are gradually supporting syncing data from various data sources to datasets, including:
@ -36,8 +36,7 @@ When multiple datasets are referenced in an application, AI uses the description
The key to writing a good dataset description is to clearly describe the content and characteristics of the dataset. **It is recommended that the dataset description begin with this: `Useful only when the question you want to answer is about the following: specific description`**. Here is an example of a real estate dataset description:
> Useful only when the question you want to answer is about the following: global real estate market data from 2010 to 2020. This data includes information such as the average housing price, property sales volume, and housing types for each city. In addition, this dataset also includes some economic indicators such as GDP and unemployment rate, as well as some social indicators such as population and education level. These indicators can help analyze the trends and influencing factors of the real estate market.
> With this data, we can understand the development trends of the global real estate market, analyze the changes in housing prices in various cities, and understand the impact of economic and social factors on the real estate market.
> Useful only when the question you want to answer is about the following: global real estate market data from 2010 to 2020. This data includes information such as the average housing price, property sales volume, and housing types for each city. In addition, this dataset also includes some economic indicators such as GDP and unemployment rate, as well as some social indicators such as population and education level. These indicators can help analyze the trends and influencing factors of the real estate market. With this data, we can understand the development trends of the global real estate market, analyze the changes in housing prices in various cities, and understand the impact of economic and social factors on the real estate market.
### Create a dataset
@ -77,11 +76,16 @@ Modify Documents For technical reasons, if developers make the following changes
1. Adjust segmentation and cleaning settings
2. Re-upload the file
Dify support customizing the segmented and cleaned text by adding, deleting, and editing paragraphs. You can dynamically adjust your segmentation to make your dataset more accurate. Click **Document --> paragraph --> Edit** in the dataset to modify paragraphs content. Click **Document --> paragraph --> Add new segment** to manually add new paragraph.
Dify support customizing the segmented and cleaned text by adding, deleting, and editing paragraphs. You can dynamically adjust your segmentation to make your dataset more accurate. Click **Document --> paragraph --> Edit** in the dataset to modify paragraphs content and custom keywords. Click **Document --> paragraph --> Add segment --> Add a segment** to manually add new paragraph. Or click **Document --> paragraph --> Add segment --> Batch add** to batch add new paragraph.
<figure><img src="../../.gitbook/assets/image (3).png" alt=""><figcaption><p>Edit</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/image (1).png" alt=""><figcaption><p>add</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/add-new-segment.png" alt=""><figcaption><p><strong>Add new segment</strong></p></figcaption></figure>
### Disabling and Archiving of Documents
* **Disable, cancel disable**: The dataset supports disabling documents or segments that you temporarily do not want indexed. In the dataset's document list, click the Disable button and the document will be disabled. You can also click the Disable button in the document details to disable the entire document or a segment. Disabled documents will not be indexed. To cancel the disable, click Enable on a disabled document.
* **Archive, Unarchive:** Some unused old document data can be archived if you don't want to delete it. After archiving, the data can only be viewed or deleted, not edited. In the document list of the dataset, click the Archive button to archive the document. Documents can also be archived in the document details page. Archived documents will not be indexed. Archived documents can also be unarchived by clicking the Unarchive button.
### Maintain Datasets via API
@ -112,26 +116,19 @@ Once the dataset is ready, it needs to be integrated into the application. When
A: If your PDF parsing appears garbled under certain formatted contents, you could consider converting the PDF to Markdown format, which currently offers higher accuracy, or you could reduce the use of images, tables, and other formatted content in the PDF. We are researching ways to optimize the experience of using PDFs.
**Q: How does the consumption mechanism of context work?**
A: With a dataset added, each query will consume segmented content (currently embedding two segments) + question + prompt + chat history combined. However, it will not exceed model limitations, such as 4096.
**Q: How does the consumption mechanism of context work?** A: With a dataset added, each query will consume segmented content (currently embedding two segments) + question + prompt + chat history combined. However, it will not exceed model limitations, such as 4096.
**Q: Where does the embedded dataset appear when asking questions?**
A: It will be embedded as context before the question.
**Q: Where does the embedded dataset appear when asking questions?** A: It will be embedded as context before the question.
**Q: Is there any priority between the added dataset and OpenAI's answers?**
A: The dataset serves as context and is used together with questions for LLM to understand and answer; there is no priority relationship.
**Q: Is there any priority between the added dataset and OpenAI's answers?** A: The dataset serves as context and is used together with questions for LLM to understand and answer; there is no priority relationship.
**Q: Why can I hit in test but not in application?**
A: You can troubleshoot issues by following these steps:
**Q: Why can I hit in test but not in application?** A: You can troubleshoot issues by following these steps:
1. Make sure you have added text on the prompt page and clicked on the save button in the top right corner.
2. Test whether it responds normally in the prompt debugging interface.
3. Try again in a new WebApp session window.
4. Optimize your data format and quality. For practice reference, visit [https://github.com/langgenius/dify/issues/90](https://github.com/langgenius/dify/issues/90)
If none of these steps solve your problem, please join our community for help.
4. Optimize your data format and quality. For practice reference, visit [https://github.com/langgenius/dify/issues/90](https://github.com/langgenius/dify/issues/90) If none of these steps solve your problem, please join our community for help.
**Q: Will APIs related to hit testing be opened up so that dify can access knowledge bases and implement dialogue generation using custom models?**
A: We plan to open up Webhooks later on; however, there are no current plans for this feature. You can achieve your requirements by connecting to any vector database.
**Q: Will APIs related to hit testing be opened up so that dify can access knowledge bases and implement dialogue generation using custom models?** A: We plan to open up Webhooks later on; however, there are no current plans for this feature. You can achieve your requirements by connecting to any vector database.
**Q: How do I add multiple datasets?**
A: Due to short-term performance considerations, we currently only support one dataset. If you have multiple sets of data, you can upload them within the same dataset for use.
**Q: How do I add multiple datasets?** A: Due to short-term performance considerations, we currently only support one dataset. If you have multiple sets of data, you can upload them within the same dataset for use.

View File

@ -39,11 +39,11 @@ Create an integration in your [integration's settings](https://www.notion.so/my-
Click the " **New integration** " button, the type is Internal by default (cannot be modified), select the associated space, enter the name and upload the logo, and click "**Submit**" to create the integration successfully.
<figure><img src="../../.gitbook/assets/image.png" alt=""><figcaption></figcaption></figure>
<figure><img src="../../.gitbook/assets/image (4).png" alt=""><figcaption></figcaption></figure>
Once the integration is created, you can update its settings as needed under the **Capabilities** tab and click the "**Show**" button under **Secrets** and then copy the Secrets.
<figure><img src="../../.gitbook/assets/image (1).png" alt=""><figcaption></figcaption></figure>
<figure><img src="../../.gitbook/assets/image (1) (1).png" alt=""><figcaption></figcaption></figure>
Copy it and back to the Dify source code , in the **.env** file configuration related environment variables, environment variables as follows:
@ -57,11 +57,11 @@ Copy it and back to the Dify source code , in the **.env** file configuration re
To toggle the switch to public settings, you need to **fill in additional information in the Organization Information** form below, including your company name, website, and Retargeting URL, and click the "Submit" button.
<figure><img src="../../.gitbook/assets/image (2).png" alt=""><figcaption></figcaption></figure>
<figure><img src="../../.gitbook/assets/image (2) (1).png" alt=""><figcaption></figcaption></figure>
After your integration has been successfully made public in your [integrations settings page](https://www.notion.so/my-integrations), you will be able to access the integrations secrets in the Secrets tab.
<figure><img src="../../.gitbook/assets/image (3).png" alt=""><figcaption></figcaption></figure>
<figure><img src="../../.gitbook/assets/image (3) (1).png" alt=""><figcaption></figcaption></figure>
Back to the Dify source code , in the **.env** file configuration related environment variables , environment variables as follows:

View File

@ -22,7 +22,7 @@ Click the "Create Application" button on the homepage to create an application.
After the application is successfully created, it will automatically redirect to the application overview page. Click on the left-hand menu: “**Prompt Eng.**” to compose the application.
<figure><img src="../../.gitbook/assets/image (2) (1).png" alt=""><figcaption></figcaption></figure>
<figure><img src="../../.gitbook/assets/image (2) (1) (1).png" alt=""><figcaption></figcaption></figure>
**2.1 Fill in Prompts**

View File

@ -32,7 +32,7 @@ Currently we support the following plugins:
We can choose the plugins needed for this conversation before the conversation starts.
<figure><img src="../.gitbook/assets/image (4).png" alt=""><figcaption></figcaption></figure>
<figure><img src="../.gitbook/assets/image (4) (1).png" alt=""><figcaption></figcaption></figure>
If you use the Google search plugin, you need to configure the SerpAPI key.
@ -50,11 +50,8 @@ We can select the datasets needed for this conversation before the conversation
<figure><img src="../.gitbook/assets/image (5).png" alt=""><figcaption></figcaption></figure>
### The process of thinking
The thinking process refers to the process of the model using plugins and datasets. We can see the thought process in each answer.
<figure><img src="../.gitbook/assets/image (23).png" alt=""><figcaption></figcaption></figure>

View File

@ -94,7 +94,7 @@ _I want you to act as an IT Expert in my Notion workspace, using your knowledge
It's recommended to initially enable the AI to actively furnish the users with a starter sentence, providing a clue as to what they can ask. Furthermore, activating the 'Speech to Text' feature can allow users to interact with your AI assistant using their voice.
<figure><img src="../.gitbook/assets/image (3) (1).png" alt=""><figcaption></figcaption></figure>
<figure><img src="../.gitbook/assets/image (3) (1) (1).png" alt=""><figcaption></figcaption></figure>
Finally, Click the "Publish" button on the top right of the page. Now you can click the public URL in the "Overview" section to converse with your personalized AI assistant!

View File

@ -6,6 +6,7 @@ Conversational applications use a question-and-answer model to maintain a dialog
* Conversation remarks.
* Follow-up.
* Speech to text.
* Citations and Attributions
### Variables filled in before the dialog
@ -46,3 +47,9 @@ If the "Speech to Text" function is enabled during application programming, you
_Please make sure that the device environment you are using is authorized to use the microphone._
<figure><img src="../.gitbook/assets/image (39).png" alt=""><figcaption></figcaption></figure>
### Citations and Attributions
If the "Quotations and Attribution" feature is enabled during the application arrangement, the dialogue returns will automatically show the quoted dataset document sources.
<figure><img src="../.gitbook/assets/image.png" alt=""><figcaption></figcaption></figure>