diff --git a/en/SUMMARY.md b/en/SUMMARY.md index 2e6541f..8f8d72a 100644 --- a/en/SUMMARY.md +++ b/en/SUMMARY.md @@ -69,6 +69,7 @@ ## Tutorials * [Quick Tool Integration](tutorials/quick-tool-integration.md) +* [Advanced Tool Integration](tutorials/advanced-tool-integration.md) * [Expose API Extension on public Internet using Cloudflare Workers](tutorials/cloudflare\_workers.md) * [Connecting with Different Models](tutorials/model-configuration/README.md) * [Hugging Face](tutorials/model-configuration/hugging-face.md) diff --git a/en/tutorials/advanced-tool-integration.md b/en/tutorials/advanced-tool-integration.md new file mode 100644 index 0000000..56a3430 --- /dev/null +++ b/en/tutorials/advanced-tool-integration.md @@ -0,0 +1,273 @@ +# Advanced Tool Integration + +Before starting with this advanced guide, please make sure you have a basic understanding of the tool integration process in Dify. Check out [Quick Integration](https://docs.dify.ai/tutorials/quick-tool-integration) for a quick run through. + +### Tool Interface + +We have defined a series of helper methods in the `Tool` class to help developers quickly build more complex tools. + +#### Message Return + +Dify supports various message types such as `text`, `link`, `image`, and `file BLOB`. You can return different types of messages to the LLM and users through the following interfaces. + +Please note, some parameters in the following interfaces will be introduced in later sections. + +**Image URL** + +You only need to pass the URL of the image, and Dify will automatically download the image and return it to the user. + +```python + def create_image_message(self, image: str, save_as: str = '') -> ToolInvokeMessage: + """ + create an image message + + :param image: the url of the image + :return: the image message + """ +``` + +**Link** + +If you need to return a link, you can use the following interface. + +```python + def create_link_message(self, link: str, save_as: str = '') -> ToolInvokeMessage: + """ + create a link message + + :param link: the url of the link + :return: the link message + """ +``` + +**Text** + +If you need to return a text message, you can use the following interface. + +```python + def create_text_message(self, text: str, save_as: str = '') -> ToolInvokeMessage: + """ + create a text message + + :param text: the text of the message + :return: the text message + """ +``` + +**File BLOB** + +If you need to return the raw data of a file, such as images, audio, video, PPT, Word, Excel, etc., you can use the following interface. + +* `blob` The raw data of the file, of bytes type +* `meta` The metadata of the file, if you know the type of the file, it is best to pass a `mime_type`, otherwise Dify will use `octet/stream` as the default type + +```python + def create_blob_message(self, blob: bytes, meta: dict = None, save_as: str = '') -> ToolInvokeMessage: + """ + create a blob message + + :param blob: the blob + :return: the blob message + """ +``` + +#### Shortcut Tools + +In large model applications, we have two common needs: + +* First, summarize a long text in advance, and then pass the summary content to the LLM to prevent the original text from being too long for the LLM to handle +* The content obtained by the tool is a link, and the web page information needs to be crawled before it can be returned to the LLM + +To help developers quickly implement these two needs, we provide the following two shortcut tools. + +**Text Summary Tool** + +This tool takes in an user\_id and the text to be summarized, and returns the summarized text. Dify will use the default model of the current workspace to summarize the long text. + +```python + def summary(self, user_id: str, content: str) -> str: + """ + summary the content + + :param user_id: the user id + :param content: the content + :return: the summary + """ +``` + +**Web Page Crawling Tool** + +This tool takes in web page link to be crawled and a user\_agent (which can be empty), and returns a string containing the information of the web page. The `user_agent` is an optional parameter that can be used to identify the tool. If not passed, Dify will use the default `user_agent`. + +```python + def get_url(self, url: str, user_agent: str = None) -> str: + """ + get url + """ the crawled result +``` + +#### Variable Pool + +We have introduced a variable pool in `Tool` to store variables, files, etc. generated during the tool's operation. These variables can be used by other tools during the tool's operation. + +Next, we will use `DallE3` and `Vectorizer.AI` as examples to introduce how to use the variable pool. + +* `DallE3` is an image generation tool that can generate images based on text. Here, we will let `DallE3` generate a logo for a coffee shop +* `Vectorizer.AI` is a vector image conversion tool that can convert images into vector images, so that the images can be infinitely enlarged without distortion. Here, we will convert the PNG icon generated by `DallE3` into a vector image, so that it can be truly used by designers. + +**DallE3** + +First, we use DallE3. After creating the image, we save the image to the variable pool. The code is as follows: + +```python +from typing import Any, Dict, List, Union +from core.tools.entities.tool_entities import ToolInvokeMessage +from core.tools.tool.builtin_tool import BuiltinTool + +from base64 import b64decode + +from openai import OpenAI + +class DallE3Tool(BuiltinTool): + def _invoke(self, + user_id: str, + tool_paramters: Dict[str, Any], + ) -> Union[ToolInvokeMessage, List[ToolInvokeMessage]]: + """ + invoke tools + """ + client = OpenAI( + api_key=self.runtime.credentials['openai_api_key'], + ) + + # prompt + prompt = tool_paramters.get('prompt', '') + if not prompt: + return self.create_text_message('Please input prompt') + + # call openapi dalle3 + response = client.images.generate( + prompt=prompt, model='dall-e-3', + size='1024x1024', n=1, style='vivid', quality='standard', + response_format='b64_json' + ) + + result = [] + for image in response.data: + # Save all images to the variable pool through the save_as parameter. The variable name is self.VARIABLE_KEY.IMAGE.value. If new images are generated later, they will overwrite the previous images. + result.append(self.create_blob_message(blob=b64decode(image.b64_json), + meta={ 'mime_type': 'image/png' }, + save_as=self.VARIABLE_KEY.IMAGE.value)) + + return result +``` + +Note that we used `self.VARIABLE_KEY.IMAGE.value` as the variable name of the image. In order for developers' tools to cooperate with each other, we defined this `KEY`. You can use it freely, or you can choose not to use this `KEY`. Passing a custom KEY is also acceptable. + +**Vectorizer.AI** + +Next, we use Vectorizer.AI to convert the PNG icon generated by DallE3 into a vector image. Let's go through the functions we defined here. The code is as follows: + +```python +from core.tools.tool.builtin_tool import BuiltinTool +from core.tools.entities.tool_entities import ToolInvokeMessage, ToolParamter +from core.tools.errors import ToolProviderCredentialValidationError + +from typing import Any, Dict, List, Union +from httpx import post +from base64 import b64decode + +class VectorizerTool(BuiltinTool): + def _invoke(self, user_id: str, tool_paramters: Dict[str, Any]) \ + -> Union[ToolInvokeMessage, List[ToolInvokeMessage]]: + """ + Tool invocation, the image variable name needs to be passed in from here, so that we can get the image from the variable pool + """ + + + def get_runtime_parameters(self) -> List[ToolParamter]: + """ + Override the tool parameter list, we can dynamically generate the parameter list based on the actual situation in the current variable pool, so that the LLM can generate the form based on the parameter list + """ + + + def is_tool_avaliable(self) -> bool: + """ + Whether the current tool is available, if there is no image in the current variable pool, then we don't need to display this tool, just return False here + """ +``` + +Next, let's implement these three functions + +```python +from core.tools.tool.builtin_tool import BuiltinTool +from core.tools.entities.tool_entities import ToolInvokeMessage, ToolParamter +from core.tools.errors import ToolProviderCredentialValidationError + +from typing import Any, Dict, List, Union +from httpx import post +from base64 import b64decode + +class VectorizerTool(BuiltinTool): + def _invoke(self, user_id: str, tool_paramters: Dict[str, Any]) \ + -> Union[ToolInvokeMessage, List[ToolInvokeMessage]]: + """ + invoke tools + """ + api_key_name = self.runtime.credentials.get('api_key_name', None) + api_key_value = self.runtime.credentials.get('api_key_value', None) + + if not api_key_name or not api_key_value: + raise ToolProviderCredentialValidationError('Please input api key name and value') + + # Get image_id, the definition of image_id can be found in get_runtime_parameters + image_id = tool_paramters.get('image_id', '') + if not image_id: + return self.create_text_message('Please input image id') + + # Get the image generated by DallE from the variable pool + image_binary = self.get_variable_file(self.VARIABLE_KEY.IMAGE) + if not image_binary: + return self.create_text_message('Image not found, please request user to generate image firstly.') + + # Generate vector image + response = post( + 'https://vectorizer.ai/api/v1/vectorize', + files={ 'image': image_binary }, + data={ 'mode': 'test' }, + auth=(api_key_name, api_key_value), + timeout=30 + ) + + if response.status_code != 200: + raise Exception(response.text) + + return [ + self.create_text_message('the vectorized svg is saved as an image.'), + self.create_blob_message(blob=response.content, + meta={'mime_type': 'image/svg+xml'}) + ] + + def get_runtime_parameters(self) -> List[ToolParamter]: + """ + override the runtime parameters + """ + # Here, we override the tool parameter list, define the image_id, and set its option list to all images in the current variable pool. The configuration here is consistent with the configuration in yaml. + return [ + ToolParamter.get_simple_instance( + name='image_id', + llm_description=f'the image id that you want to vectorize, \ + and the image id should be specified in \ + {[i.name for i in self.list_default_image_variables()]}', + type=ToolParamter.ToolParameterType.SELECT, + required=True, + options=[i.name for i in self.list_default_image_variables()] + ) + ] + + def is_tool_avaliable(self) -> bool: + # Only when there are images in the variable pool, the LLM needs to use this tool + return len(self.list_default_image_variables()) > 0 +``` + +It's worth noting that we didn't actually use `image_id` here. We assumed that there must be an image in the default variable pool when calling this tool, so we directly used `image_binary = self.get_variable_file(self.VARIABLE_KEY.IMAGE)` to get the image. In cases where the model's capabilities are weak, we recommend developers to do the same, which can effectively improve fault tolerance and avoid the model passing incorrect parameters.