489 lines
16 KiB
Markdown
489 lines
16 KiB
Markdown
## What is DolphinScheduler?
|
||
|
||
DolphinScheduler is a distributed and easy-to-expand visual DAG workflow scheduling system, dedicated to solving the complex dependencies in data processing, making the scheduling system out of the box for data processing.
|
||
|
||
GitHub URL: https://github.com/apache/incubator-dolphinscheduler
|
||
|
||
Official Website: https://dolphinscheduler.apache.org
|
||
|
||

|
||
|
||
[](README.md)
|
||
[](README_zh_CN.md)
|
||
|
||
## Prerequisites
|
||
|
||
- [Docker](https://docs.docker.com/engine/) 1.13.1+
|
||
- [Docker Compose](https://docs.docker.com/compose/) 1.11.0+
|
||
|
||
## How to use this docker image
|
||
|
||
#### You can start a dolphinscheduler by docker-compose (recommended)
|
||
|
||
```
|
||
$ docker-compose -f ./docker/docker-swarm/docker-compose.yml up -d
|
||
```
|
||
|
||
The default **postgres** user `root`, postgres password `root` and database `dolphinscheduler` are created in the `docker-compose.yml`.
|
||
|
||
The default **zookeeper** is created in the `docker-compose.yml`.
|
||
|
||
Access the Web UI: http://192.168.xx.xx:12345/dolphinscheduler
|
||
|
||
The default username is `admin` and the default password is `dolphinscheduler123`
|
||
|
||
> **Tip**: For quick start in docker, you can create a tenant named `ds` and associate the user `admin` with the tenant `ds`
|
||
|
||
#### Or via Environment Variables **`DATABASE_HOST`** **`DATABASE_PORT`** **`DATABASE_DATABASE`** **`ZOOKEEPER_QUORUM`**
|
||
|
||
You can specify **existing postgres and zookeeper service**. Example:
|
||
|
||
```
|
||
$ docker run -d --name dolphinscheduler \
|
||
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
|
||
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
|
||
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
|
||
-p 12345:12345 \
|
||
apache/dolphinscheduler:latest all
|
||
```
|
||
|
||
Access the Web UI:http://192.168.xx.xx:12345/dolphinscheduler
|
||
|
||
#### Or start a standalone dolphinscheduler server
|
||
|
||
You can start a standalone dolphinscheduler server.
|
||
|
||
* Create a **local volume** for resource storage, For example:
|
||
|
||
```
|
||
docker volume create dolphinscheduler-resource-local
|
||
```
|
||
|
||
* Start a **master server**, For example:
|
||
|
||
```
|
||
$ docker run -d --name dolphinscheduler-master \
|
||
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
|
||
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
|
||
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
|
||
apache/dolphinscheduler:latest master-server
|
||
```
|
||
|
||
* Start a **worker server** (including **logger server**), For example:
|
||
|
||
```
|
||
$ docker run -d --name dolphinscheduler-worker \
|
||
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
|
||
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
|
||
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
|
||
-e ALERT_LISTEN_HOST="dolphinscheduler-alert" \
|
||
-v dolphinscheduler-resource-local:/dolphinscheduler \
|
||
apache/dolphinscheduler:latest worker-server
|
||
```
|
||
|
||
* Start a **api server**, For example:
|
||
|
||
```
|
||
$ docker run -d --name dolphinscheduler-api \
|
||
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
|
||
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
|
||
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
|
||
-v dolphinscheduler-resource-local:/dolphinscheduler \
|
||
-p 12345:12345 \
|
||
apache/dolphinscheduler:latest api-server
|
||
```
|
||
|
||
* Start a **alert server**, For example:
|
||
|
||
```
|
||
$ docker run -d --name dolphinscheduler-alert \
|
||
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
|
||
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
|
||
apache/dolphinscheduler:latest alert-server
|
||
```
|
||
|
||
**Note**: You must be specify `DATABASE_HOST` `DATABASE_PORT` `DATABASE_DATABASE` `DATABASE_USERNAME` `DATABASE_PASSWORD` `ZOOKEEPER_QUORUM` when start a standalone dolphinscheduler server.
|
||
|
||
## How to build a docker image
|
||
|
||
You can build a docker image in A Unix-like operating system, You can also build it in Windows operating system.
|
||
|
||
In Unix-Like, Example:
|
||
|
||
```bash
|
||
$ cd path/incubator-dolphinscheduler
|
||
$ sh ./docker/build/hooks/build
|
||
```
|
||
|
||
In Windows, Example:
|
||
|
||
```bat
|
||
C:\incubator-dolphinscheduler>.\docker\build\hooks\build.bat
|
||
```
|
||
|
||
Please read `./docker/build/hooks/build` `./docker/build/hooks/build.bat` script files if you don't understand
|
||
|
||
## Environment Variables
|
||
|
||
The DolphinScheduler Docker container is configured through environment variables, and the default value will be used if an environment variable is not set.
|
||
|
||
**`DATABASE_TYPE`**
|
||
|
||
This environment variable sets the type for database. The default value is `postgresql`.
|
||
|
||
**Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`.
|
||
|
||
**`DATABASE_DRIVER`**
|
||
|
||
This environment variable sets the type for database. The default value is `org.postgresql.Driver`.
|
||
|
||
**Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`.
|
||
|
||
**`DATABASE_HOST`**
|
||
|
||
This environment variable sets the host for database. The default value is `127.0.0.1`.
|
||
|
||
**Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`.
|
||
|
||
**`DATABASE_PORT`**
|
||
|
||
This environment variable sets the port for database. The default value is `5432`.
|
||
|
||
**Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`.
|
||
|
||
**`DATABASE_USERNAME`**
|
||
|
||
This environment variable sets the username for database. The default value is `root`.
|
||
|
||
**Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`.
|
||
|
||
**`DATABASE_PASSWORD`**
|
||
|
||
This environment variable sets the password for database. The default value is `root`.
|
||
|
||
**Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`.
|
||
|
||
**`DATABASE_DATABASE`**
|
||
|
||
This environment variable sets the database for database. The default value is `dolphinscheduler`.
|
||
|
||
**Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`.
|
||
|
||
**`DATABASE_PARAMS`**
|
||
|
||
This environment variable sets the database for database. The default value is `characterEncoding=utf8`.
|
||
|
||
**Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`, `api-server`, `alert-server`.
|
||
|
||
**`HADOOP_HOME`**
|
||
|
||
This environment variable sets `HADOOP_HOME`. The default value is `/opt/soft/hadoop`.
|
||
|
||
**`HADOOP_CONF_DIR`**
|
||
|
||
This environment variable sets `HADOOP_CONF_DIR`. The default value is `/opt/soft/hadoop/etc/hadoop`.
|
||
|
||
**`SPARK_HOME1`**
|
||
|
||
This environment variable sets `SPARK_HOME1`. The default value is `/opt/soft/spark1`.
|
||
|
||
**`SPARK_HOME2`**
|
||
|
||
This environment variable sets `SPARK_HOME2`. The default value is `/opt/soft/spark2`.
|
||
|
||
**`PYTHON_HOME`**
|
||
|
||
This environment variable sets `PYTHON_HOME`. The default value is `/usr`.
|
||
|
||
**`JAVA_HOME`**
|
||
|
||
This environment variable sets `JAVA_HOME`. The default value is `/usr/lib/jvm/java-1.8-openjdk`.
|
||
|
||
**`HIVE_HOME`**
|
||
|
||
This environment variable sets `HIVE_HOME`. The default value is `/opt/soft/hive`.
|
||
|
||
**`FLINK_HOME`**
|
||
|
||
This environment variable sets `FLINK_HOME`. The default value is `/opt/soft/flink`.
|
||
|
||
**`DATAX_HOME`**
|
||
|
||
This environment variable sets `DATAX_HOME`. The default value is `/opt/soft/datax`.
|
||
|
||
**`DOLPHINSCHEDULER_DATA_BASEDIR_PATH`**
|
||
|
||
User data directory path, self configuration, please make sure the directory exists and have read write permissions. The default value is `/tmp/dolphinscheduler`
|
||
|
||
**`DOLPHINSCHEDULER_OPTS`**
|
||
|
||
This environment variable sets java options. The default value is empty.
|
||
|
||
**`RESOURCE_STORAGE_TYPE`**
|
||
|
||
This environment variable sets resource storage type for dolphinscheduler like `HDFS`, `S3`, `NONE`. The default value is `HDFS`.
|
||
|
||
**`RESOURCE_UPLOAD_PATH`**
|
||
|
||
This environment variable sets resource store path on HDFS/S3 for resource storage. The default value is `/dolphinscheduler`.
|
||
|
||
**`FS_DEFAULT_FS`**
|
||
|
||
This environment variable sets fs.defaultFS for resource storage like `file:///`, `hdfs://mycluster:8020` or `s3a://dolphinscheduler`. The default value is `file:///`.
|
||
|
||
**`FS_S3A_ENDPOINT`**
|
||
|
||
This environment variable sets s3 endpoint for resource storage. The default value is `s3.xxx.amazonaws.com`.
|
||
|
||
**`FS_S3A_ACCESS_KEY`**
|
||
|
||
This environment variable sets s3 access key for resource storage. The default value is `xxxxxxx`.
|
||
|
||
**`FS_S3A_SECRET_KEY`**
|
||
|
||
This environment variable sets s3 secret key for resource storage. The default value is `xxxxxxx`.
|
||
|
||
**`ZOOKEEPER_QUORUM`**
|
||
|
||
This environment variable sets zookeeper quorum for `master-server` and `worker-serverr`. The default value is `127.0.0.1:2181`.
|
||
|
||
**Note**: You must be specify it when start a standalone dolphinscheduler server. Like `master-server`, `worker-server`.
|
||
|
||
**`ZOOKEEPER_ROOT`**
|
||
|
||
This environment variable sets zookeeper root directory for dolphinscheduler. The default value is `/dolphinscheduler`.
|
||
|
||
**`MASTER_EXEC_THREADS`**
|
||
|
||
This environment variable sets exec thread num for `master-server`. The default value is `100`.
|
||
|
||
**`MASTER_EXEC_TASK_NUM`**
|
||
|
||
This environment variable sets exec task num for `master-server`. The default value is `20`.
|
||
|
||
**`MASTER_HEARTBEAT_INTERVAL`**
|
||
|
||
This environment variable sets heartbeat interval for `master-server`. The default value is `10`.
|
||
|
||
**`MASTER_TASK_COMMIT_RETRYTIMES`**
|
||
|
||
This environment variable sets task commit retry times for `master-server`. The default value is `5`.
|
||
|
||
**`MASTER_TASK_COMMIT_INTERVAL`**
|
||
|
||
This environment variable sets task commit interval for `master-server`. The default value is `1000`.
|
||
|
||
**`MASTER_MAX_CPULOAD_AVG`**
|
||
|
||
This environment variable sets max cpu load avg for `master-server`. The default value is `100`.
|
||
|
||
**`MASTER_RESERVED_MEMORY`**
|
||
|
||
This environment variable sets reserved memory for `master-server`. The default value is `0.1`.
|
||
|
||
**`MASTER_LISTEN_PORT`**
|
||
|
||
This environment variable sets port for `master-server`. The default value is `5678`.
|
||
|
||
**`WORKER_EXEC_THREADS`**
|
||
|
||
This environment variable sets exec thread num for `worker-server`. The default value is `100`.
|
||
|
||
**`WORKER_HEARTBEAT_INTERVAL`**
|
||
|
||
This environment variable sets heartbeat interval for `worker-server`. The default value is `10`.
|
||
|
||
**`WORKER_MAX_CPULOAD_AVG`**
|
||
|
||
This environment variable sets max cpu load avg for `worker-server`. The default value is `100`.
|
||
|
||
**`WORKER_RESERVED_MEMORY`**
|
||
|
||
This environment variable sets reserved memory for `worker-server`. The default value is `0.1`.
|
||
|
||
**`WORKER_LISTEN_PORT`**
|
||
|
||
This environment variable sets port for `worker-server`. The default value is `1234`.
|
||
|
||
**`WORKER_GROUPS`**
|
||
|
||
This environment variable sets groups for `worker-server`. The default value is `default`.
|
||
|
||
**`WORKER_HOST_WEIGHT`**
|
||
|
||
This environment variable sets weight for `worker-server`. The default value is `100`.
|
||
|
||
**`ALERT_LISTEN_HOST`**
|
||
|
||
This environment variable sets the host of `alert-server` for `worker-server`. The default value is `127.0.0.1`.
|
||
|
||
**`ALERT_PLUGIN_DIR`**
|
||
|
||
This environment variable sets the alert plugin directory for `alert-server`. The default value is `lib/plugin/alert`.
|
||
|
||
## Initialization scripts
|
||
|
||
If you would like to do additional initialization in an image derived from this one, add one or more environment variable under `/root/start-init-conf.sh`, and modify template files in `/opt/dolphinscheduler/conf/*.tpl`.
|
||
|
||
For example, to add an environment variable `API_SERVER_PORT` in `/root/start-init-conf.sh`:
|
||
|
||
```
|
||
export API_SERVER_PORT=5555
|
||
```
|
||
|
||
and to modify `/opt/dolphinscheduler/conf/application-api.properties.tpl` template file, add server port:
|
||
```
|
||
server.port=${API_SERVER_PORT}
|
||
```
|
||
|
||
`/root/start-init-conf.sh` will dynamically generate config file:
|
||
|
||
```sh
|
||
echo "generate dolphinscheduler config"
|
||
ls ${DOLPHINSCHEDULER_HOME}/conf/ | grep ".tpl" | while read line; do
|
||
eval "cat << EOF
|
||
$(cat ${DOLPHINSCHEDULER_HOME}/conf/${line})
|
||
EOF
|
||
" > ${DOLPHINSCHEDULER_HOME}/conf/${line%.*}
|
||
done
|
||
```
|
||
|
||
## FAQ
|
||
|
||
### How to stop dolphinscheduler by docker-compose?
|
||
|
||
Stop containers:
|
||
|
||
```
|
||
docker-compose stop
|
||
```
|
||
|
||
Stop containers and remove containers, networks and volumes:
|
||
|
||
```
|
||
docker-compose down -v
|
||
```
|
||
|
||
### How to deploy dolphinscheduler on Docker Swarm?
|
||
|
||
Assuming that the Docker Swarm cluster has been created (If there is no Docker Swarm cluster, please refer to [create-swarm](https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/))
|
||
|
||
Start a stack named dolphinscheduler
|
||
|
||
```
|
||
docker stack deploy -c docker-stack.yml dolphinscheduler
|
||
```
|
||
|
||
Stop and remove the stack named dolphinscheduler
|
||
|
||
```
|
||
docker stack rm dolphinscheduler
|
||
```
|
||
|
||
### How to use MySQL as the DolphinScheduler's database instead of PostgreSQL?
|
||
|
||
> Because of the commercial license, we cannot directly use the driver and client of MySQL.
|
||
>
|
||
> If you want to use MySQL, you can build a new image based on the `apache/dolphinscheduler` image as follows.
|
||
|
||
1. Download the MySQL driver [mysql-connector-java-5.1.49.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jar) (require `>=5.1.47`)
|
||
|
||
2. Create a new `Dockerfile` to add MySQL driver and client:
|
||
|
||
```
|
||
FROM apache/dolphinscheduler:latest
|
||
COPY mysql-connector-java-5.1.49.jar /opt/dolphinscheduler/lib
|
||
RUN apk add --update --no-cache mysql-client
|
||
```
|
||
|
||
3. Build a new docker image including MySQL driver and client:
|
||
|
||
```
|
||
docker build -t apache/dolphinscheduler:mysql .
|
||
```
|
||
|
||
4. Modify all `image` fields to `apache/dolphinscheduler:mysql` in `docker-compose.yml`
|
||
|
||
> If you want to deploy dolphinscheduler on Docker Swarm, you need modify `docker-stack.yml`
|
||
|
||
5. Comment the `dolphinscheduler-postgresql` block in `docker-compose.yml`
|
||
|
||
6. Add `dolphinscheduler-mysql` service in `docker-compose.yml` (**Optional**, you can directly use a external MySQL database)
|
||
|
||
7. Modify all DATABASE environments in `docker-compose.yml`
|
||
|
||
```
|
||
DATABASE_TYPE: mysql
|
||
DATABASE_DRIVER: com.mysql.jdbc.Driver
|
||
DATABASE_HOST: dolphinscheduler-mysql
|
||
DATABASE_PORT: 3306
|
||
DATABASE_USERNAME: root
|
||
DATABASE_PASSWORD: root
|
||
DATABASE_DATABASE: dolphinscheduler
|
||
DATABASE_PARAMS: useUnicode=true&characterEncoding=UTF-8
|
||
```
|
||
|
||
> If you have added `dolphinscheduler-mysql` service in `docker-compose.yml`, just set `DATABASE_HOST` to `dolphinscheduler-mysql`
|
||
|
||
8. Run a dolphinscheduler (See **How to use this docker image**)
|
||
|
||
### How to support MySQL datasource in `Datasource manage`?
|
||
|
||
> Because of the commercial license, we cannot directly use the driver of MySQL.
|
||
>
|
||
> If you want to add MySQL datasource, you can build a new image based on the `apache/dolphinscheduler` image as follows.
|
||
|
||
1. Download the MySQL driver [mysql-connector-java-5.1.49.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jar) (require `>=5.1.47`)
|
||
|
||
2. Create a new `Dockerfile` to add MySQL driver:
|
||
|
||
```
|
||
FROM apache/dolphinscheduler:latest
|
||
COPY mysql-connector-java-5.1.49.jar /opt/dolphinscheduler/lib
|
||
```
|
||
|
||
3. Build a new docker image including MySQL driver:
|
||
|
||
```
|
||
docker build -t apache/dolphinscheduler:mysql-driver .
|
||
```
|
||
|
||
4. Modify all `image` fields to `apache/dolphinscheduler:mysql-driver` in `docker-compose.yml`
|
||
|
||
> If you want to deploy dolphinscheduler on Docker Swarm, you need modify `docker-stack.yml`
|
||
|
||
5. Run a dolphinscheduler (See **How to use this docker image**)
|
||
|
||
6. Add a MySQL datasource in `Datasource manage`
|
||
|
||
### How to support Oracle datasource in `Datasource manage`?
|
||
|
||
> Because of the commercial license, we cannot directly use the driver of Oracle.
|
||
>
|
||
> If you want to add Oracle datasource, you can build a new image based on the `apache/dolphinscheduler` image as follows.
|
||
|
||
1. Download the Oracle driver [ojdbc8.jar](https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/) (such as `ojdbc8-19.9.0.0.jar`)
|
||
|
||
2. Create a new `Dockerfile` to add Oracle driver:
|
||
|
||
```
|
||
FROM apache/dolphinscheduler:latest
|
||
COPY ojdbc8-19.9.0.0.jar /opt/dolphinscheduler/lib
|
||
```
|
||
|
||
3. Build a new docker image including Oracle driver:
|
||
|
||
```
|
||
docker build -t apache/dolphinscheduler:oracle-driver .
|
||
```
|
||
|
||
4. Modify all `image` fields to `apache/dolphinscheduler:oracle-driver` in `docker-compose.yml`
|
||
|
||
> If you want to deploy dolphinscheduler on Docker Swarm, you need modify `docker-stack.yml`
|
||
|
||
5. Run a dolphinscheduler (See **How to use this docker image**)
|
||
|
||
6. Add a Oracle datasource in `Datasource manage`
|
||
|
||
For more information please refer to the [incubator-dolphinscheduler](https://github.com/apache/incubator-dolphinscheduler.git) documentation.
|