Add en doc sync-from-website

pull/145/head
Mark Sun 2024-07-08 09:17:03 +08:00
parent 8fbbe3c25d
commit 1f969d0764
6 changed files with 28 additions and 1 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 223 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 385 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 539 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 379 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 188 KiB

View File

@ -1 +1,28 @@
# Under Maintenance
# Importing Data from Web Pages
Dify Knowledge Base supports web scraping and parsing into Markdown for import into the knowledge base through integration with Firecrawl.
**Note:**
[Firecrawl](https://www.firecrawl.dev/) is an open-source web parsing tool that converts web pages into clean and LLM-friendly Markdown format text. It also provides an easy-to-use API service.
### How to Configure
First, you need to configure Firecrawl credentials on the DataSource page.
<figure><img src="../../.gitbook/assets/image (6).png" alt=""><figcaption></figcaption></figure>
Log in to the [Firecrawl official website](https://www.firecrawl.dev/), complete the registration, obtain the API Key, and then fill it in and save.
<figure><img src="../../.gitbook/assets/image (7).png" alt=""><figcaption></figcaption></figure>
On the knowledge base creation page, select **Sync from website** and **enter the URL of the web page to be scraped**.
<figure><img src="../../.gitbook/assets/image (1).png" alt=""><figcaption><p>Web page scraping configuration</p></figcaption></figure>
The configuration items in the settings include: whether to scrape subpages, the maximum number of pages to scrape, the depth of page scraping, excluding pages, only scraping specific pages, and extracting content. After completing the configuration, click **Run** to preview the parsed pages.
<figure><img src="../../.gitbook/assets/image (2).png" alt=""><figcaption><p>Executing the scrape</p></figcaption></figure>
After importing the parsed text from the web page into the knowledge base documents, view the import results. Click **Add URL** to continue importing new web pages.
<figure><img src="../../.gitbook/assets/image (5).png" alt=""><figcaption><p>Importing parsed web page text into the knowledge base</p></figcaption></figure>