3.2 KiB

Raw Blame History

Connecting to Xinference Local Deployed Models

🚧 WIP

Xorbits inference is a powerful and versatile library designed to serve language, speech recognition, and multimodal models, and can even be used on laptops. It supports various models compatible with GGML, such as chatglm, baichuan, whisper, vicuna, orca, etc. And Dify supports connecting to Xinference deployed large language model inference and embedding capabilities locally.

Deploy Xinference

There are two ways to deploy Xinference, namely local deployment and distributed deployment, here we take local deployment as an example.

First, install Xinference via PyPI:
```
$ pip install "xinference[all]"
```

Start Xinference locally:

$ xinference
2023-08-20 19:21:05,265 xinference   10148 INFO     Xinference successfully started. Endpoint: http://127.0.0.1:9997
2023-08-20 19:21:05,266 xinference.core.supervisor 10148 INFO     Worker 127.0.0.1:37822 has been added successfully
2023-08-20 19:21:05,267 xinference.deploy.worker 10148 INFO     Xinference worker successfully started.

Xinference will start a worker locally by default, with the endpoint: http://127.0.0.1:9997, and the default port is 9997. To modify the host or port, you can refer to xinference's help information: xinference --help.

Create and deploy the model

Visit http://127.0.0.1:9997, select the model and specification you need to deploy, and click Create to create and deploy the model, as shown below:

As different models have different compatibility on different hardware platforms, please refer to Xinference built-in models to ensure the created model supports the current hardware platform.

Obtain the model UID

Return to the command line interface and enter:

$ xinference list
UID                                   Type    Name         Format      Size (in billions)  Quantization
------------------------------------  ------  -----------  --------  --------------------  --------------
a9e4d530-3f4b-11ee-a9b9-e6608f0bd69a  LLM     vicuna-v1.3  ggmlv3                       7  q2_K

The first column is the model UID created in step 3, such as a9e4d530-3f4b-11ee-a9b9-e6608f0bd69a above.

After the model is deployed, connect the deployed model in Dify.

In Settings > Model Providers > Xinference, enter:
- Model name: vicuna-v1.3
- Server URL: http://127.0.0.1:9997
- Model UID: a9e4d530-3f4b-11ee-a9b9-e6608f0bd69a
Click "Save" to use the model in the dify application.

Dify also supports using Xinference builtin models as Embedding models, just select the Embeddings type in the configuration box.

For more information about Xinference, please refer to: Xorbits Inference

3.2 KiB Raw Blame History

Connecting to Xinference Local Deployed Models

Deploy Xinference

3.2 KiB

Raw Blame History