diff --git a/en/.gitbook/assets/guides/knowledge-base/image (1).png b/en/.gitbook/assets/guides/knowledge-base/image (1).png new file mode 100644 index 0000000..341d71b Binary files /dev/null and b/en/.gitbook/assets/guides/knowledge-base/image (1).png differ diff --git a/en/.gitbook/assets/guides/knowledge-base/image (2).png b/en/.gitbook/assets/guides/knowledge-base/image (2).png new file mode 100644 index 0000000..8b89e51 Binary files /dev/null and b/en/.gitbook/assets/guides/knowledge-base/image (2).png differ diff --git a/en/.gitbook/assets/guides/knowledge-base/image (5).png b/en/.gitbook/assets/guides/knowledge-base/image (5).png new file mode 100644 index 0000000..77bece2 Binary files /dev/null and b/en/.gitbook/assets/guides/knowledge-base/image (5).png differ diff --git a/en/.gitbook/assets/guides/knowledge-base/image (6).png b/en/.gitbook/assets/guides/knowledge-base/image (6).png new file mode 100644 index 0000000..6fbc0e8 Binary files /dev/null and b/en/.gitbook/assets/guides/knowledge-base/image (6).png differ diff --git a/en/.gitbook/assets/guides/knowledge-base/image (7).png b/en/.gitbook/assets/guides/knowledge-base/image (7).png new file mode 100644 index 0000000..6ebf8cc Binary files /dev/null and b/en/.gitbook/assets/guides/knowledge-base/image (7).png differ diff --git a/en/guides/knowledge-base/sync-from-website.md b/en/guides/knowledge-base/sync-from-website.md index b476a61..b2735bd 100644 --- a/en/guides/knowledge-base/sync-from-website.md +++ b/en/guides/knowledge-base/sync-from-website.md @@ -1 +1,28 @@ -# Under Maintenance +# Importing Data from Web Pages + +Dify Knowledge Base supports web scraping and parsing into Markdown for import into the knowledge base through integration with Firecrawl. + +**Note:** +[Firecrawl](https://www.firecrawl.dev/) is an open-source web parsing tool that converts web pages into clean and LLM-friendly Markdown format text. It also provides an easy-to-use API service. + +### How to Configure + +First, you need to configure Firecrawl credentials on the DataSource page. + +
+ +Log in to the [Firecrawl official website](https://www.firecrawl.dev/), complete the registration, obtain the API Key, and then fill it in and save. + +
+ +On the knowledge base creation page, select **Sync from website** and **enter the URL of the web page to be scraped**. + +

Web page scraping configuration

+ +The configuration items in the settings include: whether to scrape subpages, the maximum number of pages to scrape, the depth of page scraping, excluding pages, only scraping specific pages, and extracting content. After completing the configuration, click **Run** to preview the parsed pages. + +

Executing the scrape

+ +After importing the parsed text from the web page into the knowledge base documents, view the import results. Click **Add URL** to continue importing new web pages. + +

Importing parsed web page text into the knowledge base

\ No newline at end of file