Compare commits

...

149 Commits

Author SHA1 Message Date
Jiajie Zhong de50f43de6
[common] Make dolphinscheduler_env.sh work when start server (#9726)
* [common] Make dolphinscheduler_env.sh work

* Change dist tarball `dolphinscheduler_env.sh` location
  from `bin/` to `conf/`, which users could finish their
  change configuration operation in one single directory.
  and we only need to add `$DOLPHINSCHEDULER_HOME/conf`
  when we start our sever instead of adding both
  `$DOLPHINSCHEDULER_HOME/conf` and `$DOLPHINSCHEDULER_HOME/bin`
* Change the `start.sh`'s path of `dolphinscheduler_env.sh`
* Change the setting order of `dolphinscheduler_env.sh`
* `bin/env/dolphinscheduler_env.sh` will overwrite the `<server>/conf/dolphinscheduler_env.sh`
when start the server using `bin/dolphinsceduler_daemon.sh` or `bin/install.sh`
* Change the related docs
2022-04-25 15:35:43 +08:00
WangJPLeo 7bcec7115a
[Fix-9717] The failure policy of the task flow takes effect (#9718)
* Failure policy takes effect.

* Coverage on New Code

* correct description logic

* Compatible with all scenarios

* clearer logic

Co-authored-by: WangJPLeo <wangjipeng@whaleops.com>
2022-04-25 15:29:18 +08:00
Tq e6dade71bb
fix process instance global param not include to task instance when master executing (#9730) 2022-04-25 14:05:40 +08:00
songjianet 9abcbbac2e
[Fix][UI Next][V1.0.0-Beta] Fix the bug that the tenant is 0 when editing a user. (#9739) 2022-04-25 14:05:27 +08:00
caishunfeng 0176f4bf61
[Bug-9737][Api] fix task plugin load in api (#9744)
* fix task plugin load in api

* task plugin loading by event
2022-04-25 13:26:34 +08:00
Amy0104 1f9660b80d
[Fix][UI][V1.0.0-Beta] Set the default value of host to '-' and Disable log button without host data. (#9742) 2022-04-25 12:27:08 +08:00
songjianet 15eb1618b4
[Fix][UI Next][V1.0.0-Beta] Fix citation errors. (#9741) 2022-04-25 11:37:21 +08:00
Paul Zhang cc40816f87
[Bug][Script] Fix the type of variable workersGroupMap is not supported in bash 3.x (#9614) 2022-04-25 11:32:09 +08:00
labbomb ca98a4a144
[Bug]Fixed the problem of no request from right click viewing log (#9728)
* The utils configuration files are centrally managed under common

* [Bug]Fixed the problem of no request from right click viewing log
2022-04-25 10:26:52 +08:00
陈家名 8a8b63cd96
[Improve][python] Support create table syntax and custom sql type param (#9673) 2022-04-25 10:17:20 +08:00
Devosend cd82f45d5e
[Bug][UI][V1.0.0-Beta] Fix action buttons not displayed on one line bug (#9700) 2022-04-25 10:04:04 +08:00
Devosend 3faa65ef0c
[Bug] [UI][V1.0.0-Beta] Fix task group name can't clear bug (#9708) 2022-04-25 10:03:11 +08:00
caishunfeng 5657cb9aec
[Bug-9719][Master] fix failover fail because task plugins has not been loaded (#9720) 2022-04-24 20:34:21 +08:00
gaojun2048 ebc4253d50
[fix][Service] BusinessTime should format with schedule timezone (#9714)
* BusinessTime should format with schedule timezone

* fix test error

* fix test error

* fix test error
2022-04-24 19:21:21 +08:00
Amy0104 257380467e
[Fix][UI Next][V1.0.0-Beta] Set the timout label to not show. (#9710) 2022-04-24 17:56:10 +08:00
labbomb 7382284b7d
[Feature]The utils configuration files are centrally managed under common (#9706) 2022-04-24 17:46:47 +08:00
Devosend 48d526f275
[Fix][UI Next][V1.0.0-Beta] Fix bug where route is error in file manage root (#9697) 2022-04-24 15:29:50 +08:00
Devosend 0643fe44a4
[Fix][UI Next][V1.0.0-Beta] Fix success logo is not display bug (#9694) 2022-04-24 15:28:26 +08:00
labbomb b276c372d4
[Feature]Unified exposure method class (#9698) 2022-04-24 15:27:11 +08:00
Amy0104 3e851940e8
[Fix][UI Next][V1.0.0-Beta] Fix the startup parameter display error. (#9692)
* [Fix][UI Next][V1.0.0-Beta] Fix the startup parameter display error.

* [Fix][UI Next][V1.0.0-Beta] Change the key of the startup parameter item.
2022-04-24 14:15:53 +08:00
songjianet 86bdb826dc
[Fix][UI Next][V1.0.0-Beta] Change update user to edit user. (#9683) 2022-04-24 14:14:12 +08:00
Tq a51b710b1c
fix alert msg and change primitive to String to avoid wrong format (#9689) 2022-04-24 13:29:27 +08:00
Devosend 99678c097c
[Fix][UI Next][V1.0.0-Beta] Fix bug where name copy is invalid (#9684) 2022-04-24 11:56:58 +08:00
Mr.An 29a0ea32c6
[Fix] Support more generic tenant code when create tenant (#9634) 2022-04-23 18:41:03 +08:00
mans2singh 4799b27e33
[hotfix][docker] Removed extra equals in mail user (#9677) 2022-04-23 12:32:31 +08:00
yimaixinchen fb11525e49
[chore] Correct the java doc in funtion DagHelperTest generateDag2 (#9602)
Co-authored-by: Jiajie Zhong <zhongjiajie955@gmail.com>
2022-04-23 10:31:48 +08:00
Eric Gao 072ba731a2
[Bug][Doc]Update database init instruction docs (#9659) 2022-04-22 23:56:00 +08:00
caishunfeng 88d2803fe1
fix task dispatch error overload resource pool of task group (#9667) 2022-04-22 18:39:40 +08:00
WangJPLeo 387ebe5bb0
Project management batch deletion should give a specific description if it fails. (#9669)
Co-authored-by: WangJPLeo <wangjipeng@whaleops.com>
2022-04-22 18:17:22 +08:00
Amy0104 b564e58cf3
[Feature][UI][V1.0.0-Beta] Add dependent task status in dependent task. (#9663)
* [Feature][UI][V1.0.0-Beta] Add dependent task status in dependent task.

* [Fix][UI][V1.0.0-Beta] Format back end data.
2022-04-22 16:30:17 +08:00
Devosend 1de7f154a3
[Bug][UI][V1.0.0-Beta] Fix display resource not exist error message bug (#9657) 2022-04-22 14:52:34 +08:00
Devosend 30a8372505
[Fix][UI][V1.0.0-Beta] Fix the parameter variables and startup parameters window cannot auto close bug (#9653) 2022-04-22 14:51:30 +08:00
Jiajie Zhong dde6d1f448
[doc] Add data quality to sidebar and correct docker resource path (#9662) 2022-04-22 14:45:07 +08:00
exmy 267b307632
[improve][api] Support to upload file without file type suffix (#9553) 2022-04-22 14:42:41 +08:00
exmy 36f01155b5
[Improvement][server] varPool support syntax #{setValue(key=value)} (#9586) 2022-04-22 14:10:21 +08:00
WangJPLeo 996790ce9e
[Improvement-9609][Worker]The resource download method is selected according to the configurati… (#9636)
* The resource download method is selected according to the configuration and the service startup verification is added.

* common check CI fix

* Startup check changed to running check

* code smell

* Coordinate resources to increase test coverage.

* Split resource download method.

* Unit Test Coverage

Co-authored-by: WangJPLeo <wangjipeng@whaleops.com>
2022-04-22 11:45:49 +08:00
Amy0104 7b1c316d9e
[Fix][UI][V1.0.0-Beta] Adjust the UI presentation of the dependent task. (#9649) 2022-04-21 17:35:56 +08:00
Devosend 58c7e5aa08
[Feature][UI][V1.0.0-Beta] Modify timeout from 10s to 15s of axios (#9644) 2022-04-21 16:03:08 +08:00
Amy0104 6966a70acc
[Fix][UI][V1.0.0-Beta] Replace the back-end interface for dependent task. (#9645)
* queryProjectCreatedAndAuthorizedByUser to queryAllProjectList
* queryAllByProjectCode to queryProcessDefinitionList
* getTasksByDefinitionCode to getTasksByDefinitionList
2022-04-21 16:02:40 +08:00
Devosend 303ee1bf15
[Fix][UI][V1.0.0-Beta]Fix data not update bug for workflow version switch (#9642) 2022-04-21 15:28:48 +08:00
zixi0825 337696e258
[Docs][DataQuality]: Add DataQuality Docs (#9512)
Co-authored-by: Jiajie Zhong <zhongjiajie955@gmail.com>
2022-04-21 14:54:38 +08:00
worry ecf13a8c90
[Improvement][style]add comment and clear warn (#9247) 2022-04-21 11:42:08 +08:00
naziD 10f8c9d983
[Bug-9608] Serialize the task definition failed (#9622)
* BugFix: serialize the task definition failed

* Remove a comment

Co-authored-by: lipandong <pandong.lpd@alibaba-inc.com>
2022-04-21 11:38:41 +08:00
Eric Gao a863c6f8f1
[Bug][DAO] Update db init script and soft_version (#9628) (#9637) 2022-04-21 11:01:19 +08:00
caishunfeng 239be31ab7
[Bug] cancel application when kill task (#9624)
* cancel application when kill task

* add warn log

* add cancel application
2022-04-20 22:46:15 +08:00
WangJPLeo ae84900329
[Fix-9617]New task group project name drop-down data is displayed according to user type. (#9625)
* New task group project name drop-down data by user type

* remove unused import

* avoid import *

Co-authored-by: WangJPLeo <wangjipeng@whaleops.com>
2022-04-20 18:24:45 +08:00
qianli2022 165d7aa51f
[Feature][Task] K8s namespace auth manager (#9303)
* k8s auth

* remove log

* fix test

* use constants

* use constants K8S_LOCAL_TEST_CLUSTER

* simple auth get

* change test

* add namespace authorize in user page

* prettier code

* change test data

Co-authored-by: qianl4 <qianl4@cicso.com>
Co-authored-by: William Tong <weitong@cisco.com>
2022-04-20 18:23:23 +08:00
Devosend 86b1870856
[Fix][UI][V1.0.0-Beta] Fix tooltip not display bug in task instance (#9630) 2022-04-20 17:44:19 +08:00
PJ Fanning f45fe85703
upgrade commons-compress to 1.21 (#9540) 2022-04-20 17:42:14 +08:00
PJ Fanning 006bcca532
update jackson and jackson.databind version to fixes many security issues (#9572) 2022-04-20 17:34:48 +08:00
PJ Fanning 7fecb92fc2
use secure version of postgresql (#9573)
* use secure version of postgresql

* Update known-dependencies.txt
2022-04-20 17:14:47 +08:00
Amy0104 a378844820
[Fix][UI][V1.0.0-Beta] Fix the task name cleared after switching the task type. (#9623) 2022-04-20 16:31:06 +08:00
Amy0104 93ee2e45c8
[Fix][UI][V1.0.0-Beta] Change node name to task name in the task modal on the task definition page. (#9620) 2022-04-20 16:30:53 +08:00
Amy0104 69bfebfec0
[Fix][UI][V1.0.0-Beta] Remove the sql comment in the procedure task and add the instructions link to the procedure task. (#9619) 2022-04-20 16:30:38 +08:00
Amy0104 a1df0ee99c
[Fix][UI] Fix the udf function echoed error in the sql task. (#9616) 2022-04-20 16:11:45 +08:00
JinYong Li 2aa191014d
fix 9584 (#9585) 2022-04-20 16:10:04 +08:00
WangJPLeo e2ec489042
[Fix-9610] Sub-workflow status check is limited to SUB_PROCESS components (#9611)
* Sub-workflow status check is limited to SUB_PROCESS components

* Sub-workflow status check is limited to SUB_PROCESS components

Co-authored-by: WangJPLeo <wangjipeng@whaleops.com>
2022-04-20 15:04:27 +08:00
xiangzihao a3bf10c88d
[Feature][API] Refactor get dependent node information api (#9591)
* feature_8922

* fix comment error

* remove unused import

* fix unused import
2022-04-20 14:53:40 +08:00
Kerwin 9d11be447a
[Python] Make detached signature during release (#9607) 2022-04-20 13:07:27 +08:00
wangyang 6155dc3dab
[Docs] enhance alert doc (#9534)
Co-authored-by: Jiajie Zhong <zhongjiajie955@gmail.com>
2022-04-20 13:02:28 +08:00
labbomb 5b2a96b830
[Feature][UI Next]Added the method of downloading files (#9605) 2022-04-20 12:15:39 +08:00
Paul Zhang 2be7183563
[Bug][Standalone Server] Deduplicate the classpath jars in start.sh of the standalone server (#9583) 2022-04-20 11:46:42 +08:00
WangJPLeo 9964c4c1e1
[Fix-9593] Storage Management StorageOperate No Instance (#9594)
* Storage Management StorageOperate No Instance

* Add StorageOperateManager unit test

* Add license header

* Fix issues in SonarCloud code analysis

Co-authored-by: WangJPLeo <wangjipeng@whaleops.com>
2022-04-20 09:58:37 +08:00
mans2singh 930d12031a
[task-flink][docs] Corrected name (#9600) 2022-04-20 09:26:42 +08:00
jiachuan.zhu 24e455304c
Add ingress annotations (#9492) 2022-04-19 20:01:57 +08:00
songjianet dd3dbf927c
[Fix][UI Next][V1.0.0-Beta] The version information creation time in the workflow definition is changed to the operateTime field. (#9590) 2022-04-19 18:57:16 +08:00
songjianet 3ea942dbf8
[Docs] Add license file for screenfull. (#9581) 2022-04-19 16:10:34 +08:00
caishunfeng b4017d0afd
fix task kill (#9578) 2022-04-19 15:26:12 +08:00
Tq c5b7e5adff
[Bug] [API-9558]fix homepage task instance count method to use submit time to recount (#9559)
* fix homepage task instance count method to use submit time to recount

* fix homepage task instance count method to use submit time to recount

* fix homepage task instance count method to use submit time to recount

* fix homepage task instance count method JUNIT

* fix homepage task instance count method JUNIT

* fix homepage task instance count method JUNIT
2022-04-19 15:23:57 +08:00
labbomb efe04863a0
[Refactor][UI Next]Reconstructing the Log Component (#9574)
* Reconstructing the Log Component

* Delete pnpm-lock.yaml

* Delete pnpm-lock.yaml

* add pnpm-lock

* Modify comments
2022-04-19 11:48:53 +08:00
caishunfeng 63638601b0
fix process pause and rerun (#9568) 2022-04-19 10:23:56 +08:00
wqxs 32f76e487f
[fix-9428] [DataSource] delete the password and test again, it is successful without password bug (#9428) (#9531)
* fix bug: After the test data source is successful, delete the password, re-test, or success

* Modify the method of generating DatasourceUniqueId
2022-04-18 20:46:37 +08:00
sparklezzz 508ed9769a
[Fix][Master Server] handle warn+failed timeout strategy in workflow execute thread of master server (#8077) (#9485)
Co-authored-by: xudong.zhang <xudong.zhang@nio.com>
2022-04-18 20:34:22 +08:00
WangJPLeo b1d57dbce4
Check the status of the child process when the parent process is running (#9567)
Co-authored-by: WangJPLeo <wangjipeng@whaleops.com>
2022-04-18 20:27:11 +08:00
Jiajie Zhong 000e96baec
[doc] Fix dead link after 3.0.0 release (#9562) 2022-04-18 18:30:34 +08:00
Devosend 01f3dcd03b
fix workflow instance can not be save bug (#9554) 2022-04-18 18:20:29 +08:00
songjianet e534ca1298
[Perf][UI Next][V1.0.0-Beta] Optimize judgment logic. (#9561) 2022-04-18 18:18:07 +08:00
Jiajie Zhong 5529a23e5c
[doc] Add new release for 3.0.0-alpha (#9519) 2022-04-18 16:07:43 +08:00
songjianet e190ef9d0d
[Fix][UI Next][V1.0.0-Beta] Fix the color matching problem of the relationship diagram. (#9546) 2022-04-18 14:33:45 +08:00
Amy0104 4d6a732ef3
[Fix][UI] Fix the problem of displaying task execution icons. (#9549) 2022-04-18 14:13:08 +08:00
Amy0104 491e9f84ee
[Fix][UI] Fix condition task connection failure. (#9548) 2022-04-18 14:09:32 +08:00
Eric Gao a5bbf7852d
[Feature][Task-Plugin]Add zeppelin task-plugin to support Apache Zeppelin (#9327) 2022-04-17 22:10:10 +08:00
Tq 1192fb8e12
fix Fault tolerance warning mapper add alert type to insert (#9533) 2022-04-17 10:17:00 +08:00
xiangzihao ec939fcc68
[Bug] [Dev] Fix start/stop/status/init script error (#9514)
* change /bin/sh to /bin/bash

* change /bin/sh to /bin/bash

* remove declear -A to adapt mac
2022-04-16 21:34:53 +08:00
calvin adcc43fd7e
[Improve][API] Allowed the non-root user to create the task group. (#9523)
* create a new branch from dev

* fix this issue

* merge from dev
2022-04-16 21:25:56 +08:00
xiangzihao 7f41a96fc1
[Fix-9525] [Worker] Environment did not work as expected (#9527)
* fix #9525

* change to ${PYTHON_HOME}

* remove import

* fix ut error
2022-04-16 18:57:33 +08:00
songjianet 26726bb887
[Fix][UI Next][V1.0.0-Beta] Fix the resource Center dark mode display exception when viewing the file details. (#9526)
* [Fix][UI Next][V1.0.0-Beta] Fix the resource Center dark mode display exception when viewing the file details.

* [Fix][UI Next][V1.0.0-Beta] Fix the resource Center dark mode display exception when viewing the file details.
2022-04-16 16:09:15 +08:00
songjianet 0972610204
[Feature][UI Next][V1.0.0-Beta] Monaco Editor supports theme switching function. (#9521) 2022-04-15 21:50:15 +08:00
labbomb 7777a6acfd
[Fix][UI Next][V1.0.0-Beta] Added access to the child node function (#9518)
* Added access to the child node function

* Added access to the child node function
2022-04-15 18:50:52 +08:00
WangJPLeo 95ccc0c855
[Fix-9516] [Statistics] Statistics Manager shows data should belongs to current user (#9515) 2022-04-15 18:25:25 +08:00
Jiajie Zhong 1f48601c75
[python] Add task decorator for python function (#9496)
* [python] Add task decorator for python function

* Add decorator `@task`
* Add a tutorial about it
* Change tutorial doc and combine into traditional docs
  * Add sphinx-inline-tab for better view

* revert not need change

* Correct python function indent

* Correct integration test
2022-04-15 15:50:52 +08:00
Jiajie Zhong 59a026d897
[python] Support read config in env variable (#9517)
Add a new method to get config from environment variables
and for now, we have three ways to get config and the
priority is `env-var > custom-config-file > built-in-config-file`.

Environment config setting does not work in CLI, because it will
confuse users when they get config value is `var-env` but value
in the configuration file is `var-in-file`, they may not find
the way how to change it

* Add documentation
* Add it to UPDATING.md

close: #8344
2022-04-15 15:46:44 +08:00
Kerwin 4221ee2433
Add python module dependency in the dist module (#9506) 2022-04-15 13:53:37 +08:00
Paul Zhang 3815a86a3b
[Improvement][Master] Fix typo for MasterTaskExecThreadTest (#9513) 2022-04-15 11:32:22 +08:00
Tq 51d47de6de
[Bug] [ALERT-9487] change email attachment filename to add random string to avoid issues (#9498)
* change email attachment filename to add random string to avoid always use same file

* change email attachment filename to add random string to avoid always use same file
2022-04-15 10:13:13 +08:00
songjianet 46c0c18ec7
[Feature][UI Next][V1.0.0-Beta] Change the css style usage of the blue number flopper in the alarm center to css variables. (#9511) 2022-04-14 22:12:14 +08:00
caishunfeng 66d148872d
[Bug-9501][Worker] fix kill task error before running (#9509) 2022-04-14 21:34:02 +08:00
caishunfeng 7b907b854d
[Improvement][doc] update time zone doc (#9503)
* update time zone doc

* update time zone doc

* update time zone doc
2022-04-14 21:32:00 +08:00
Amy0104 8a2fbd205e
[Fix][Next-UI] Fix the jumping problem of close button on dag page. (#9494) 2022-04-14 14:09:40 +08:00
czeming 706cdb6a8e
[Fix-9174] [Alert] Fix deduplication of alarm information (#9371)
* feat(issue #9174):

Fix-9174
2022-04-14 10:30:25 +08:00
Tq df791a374e
[FIX][WEBSITE-9224]fix wechat doc and wechat form display (#9439)
Co-authored-by: Jiajie Zhong <zhongjiajie955@hotmail.com>
2022-04-13 23:35:05 +08:00
Amy0104 e134c63e68
[Bug][Next UI] Fix the branch flow options of the switch task is not available. (#9481) 2022-04-13 20:51:43 +08:00
Amy0104 e5c66ecc31
[Fix][UI Next] Fix the problem of disabling owned user input during project editing. (#9476) 2022-04-13 18:05:02 +08:00
mazhong 8821b91829
[FIX-9471][Script] fix run install.sh error (#9472) 2022-04-13 17:53:05 +08:00
Eric Gao eb21b6cc52
[Feature][Doc] Switch shell task instruction screenshots to latest (#9434) 2022-04-13 14:13:42 +08:00
Amy0104 099bde0f78
[Fix][Next-UI] Add LIST type to the custom parameter types of task node. (#9468) 2022-04-13 13:45:39 +08:00
Tq 04d4e4e0c4
fix the OUT direct param could pass to the current script (#9463) 2022-04-13 10:43:12 +08:00
labbomb 8440baa5e8
[Bug][UI Next]Fix global variable validation condition for save button in workflow definition. (#9454) 2022-04-12 17:41:46 +08:00
Jiajie Zhong 51c1d8f2aa
[doc] Migrate develop docs from website (#9457)
* [doc] Migrate develop docs from website

developer should also into main repo, because
development setup should follow with the latest
code

* Add missing change from website
2022-04-12 17:23:56 +08:00
Amy0104 ac2e1df2bf
[Fix][UI][2.0.5] Add LIST type to the custom parameter types of task node. (#9455) 2022-04-12 17:04:58 +08:00
xiangzihao d18cf184c6
[CI] Improve CI (#9436)
* test

* test

* test

* test

* test

* test

* test

* test

* test

* improve ci

* improve ci

* improve ci
2022-04-12 14:43:15 +08:00
Tq fb0139e959
[Feature] [ALERT-9406]enrich alert info when insert new row into DB (#9445)
* enrich alert info when insert new row into DB

* fix comments
2022-04-12 14:31:38 +08:00
worry 69923546a1
[DS-9263][Improvement][master]optimize failover (#9281)
- add FailoverService.java
- move failover method  from MasterRegistryClient to FailoverService
- move failover code from FailoverExecuteThread to FailoverService

This closes #9263
2022-04-12 11:53:18 +08:00
Lyle Shaw 9d7223b038
[doc] fix url wrong in zh doc (#9421)
Co-authored-by: xiaoliangyu <xiaoliangyu@bytedance.com>
2022-04-12 09:45:25 +08:00
Jiajie Zhong 3d78859fe5
[python] Recover python release properties (#9444)
This patch recovers the properties `python.sign.skip=false`
when the combined profile `release,python` is used.

also close: #9433
2022-04-12 09:42:59 +08:00
Jiajie Zhong aaf2042ec4
[python] Add missing doc about config and connect remote server (#9443)
which includes `configuration`, `run example`
`how to connect remote server`

close: #9286, #9284, #8917
2022-04-12 09:42:28 +08:00
Jiajie Zhong 62284ae689
[doc] Add some dev missing doc (#9427)
* [doc] Add some dev missing doc

Including general-setting, task-definition, audit-log
and they related img

* Change base on suggestion

Co-authored-by: Tq <tianqitobethefirst@gmail.com>
2022-04-11 21:46:40 +08:00
caishunfeng b285ccf930
[Future-9396]Support output parameters transfer from parent workflow to child work flow (#9410)
* [Future-9396]Support output parameters transfer from parent workflow to child work flow

* fix note
2022-04-11 20:03:16 +08:00
kezhenxu94 14d71d1462
[UI] Migrate NPM to PNPM in CI builds (#9431) 2022-04-11 14:16:23 +08:00
Kerwin 923f3f38e3
[Fix-9316] [Task] Configure DB2 data source SQL script execution report ResultSet has been closed exception in SQL task (#9317)
* fix db2 error in the sql task

* update limit in sql task
2022-04-11 13:19:48 +08:00
nobolity 037692517a
[Fix-9251] [WORKER] reslove the sql task about of add the udf resource failed (#9319)
* feat(resource  manager): extend s3 to the storage of ds

1.fix some spell question
2.extend the type of storage
3.add the s3utils
to manager resource
4.automatic inject the storage in addition to your
config

* fix(resource  manager): update the dependency

* fix(resource  manager): extend s3 to the storage of ds

fix the constant of hadooputils

* fix(resource  manager): extend s3 to the storage of ds

1.fix some spell question
2.delete the import *

* fix(resource  manager):

merge  the unitTest:
1.TenantServiceImpl
2.ResourceServiceImpl
3.UserServiceImpl

* fix(resource  manager): extend s3 to the storage of ds

merge the resourceServiceTest

* fix(resource  manager): test  cancel the test method

createTenant verifyTenant

* fix(resource  manager): merge the code  follow the check-result of sonar

* fix(resource  manager): extend s3 to the storage of ds

fit the spell question

* fix(resource  manager): extend s3 to the storage of ds

revert the common.properties

* fix(resource  manager): extend s3 to the storage of ds

update the storageConfig with None

* fix(resource  manager): extend s3 to the storage of ds

fix the judge of resourceType

* fix(resource  manager): extend s3 to the storage of ds

undo the compile-mysql

* fix(resource  manager): extend s3 to the storage of ds

delete hadoop aws

* fix(resource  manager): extend s3 to the storage of ds

update the know-dependencies to delete aws 1.7.4
update the e2e
file-manager common.properties

* fix(resource  manager): extend s3 to the storage of ds

update the aws-region

* fix(resource  manager): extend s3 to the storage of ds

fix the storageconfig init

* fix(resource  manager): update e2e docker-compose

update e2e docker-compose

* fix(resource  manager): extend s3 to the storage of ds

revent the e2e common.proprites

print the resource type in propertyUtil

* fix(resource  manager): extend s3 to the storage of ds
1.println the properties

* fix(resource  manager): println the s3 info

* fix(resource  manager): extend s3 to the storage of ds

delete the info  and upgrade the s3 info to e2e

* fix(resource  manager): extend s3 to the storage of ds

add the bucket init

* fix(resource  manager): extend s3 to the storage of ds

1.fix some spell question
2.delete the import *

* fix(resource  manager): extend s3 to the storage of ds

upgrade the s3 endpoint

* fix(resource  manager): withPathStyleAccessEnabled(true)

* fix(resource  manager): extend s3 to the storage of ds

1.fix some spell question
2.delete the import *

* fix(resource  manager): upgrade the  s3client builder

* fix(resource  manager): correct  the s3 point to s3client

* fix(resource  manager): update the constant BUCKET_NAME

* fix(resource  manager): e2e  s3 endpoint -> s3:9000

* fix(resource  manager): extend s3 to the storage of ds

1.fix some spell question
2.delete the import *

* style(resource  manager): add info to createBucket

* style(resource  manager): debug the log

* ci(resource  manager): test

test s3

* ci(ci): add INSERT INTO dolphinscheduler.t_ds_tenant (id, tenant_code, description, queue_id, create_time, update_time) VALUES(1, 'root', NULL, 1, NULL, NULL); to h2.sql

* fix(resource  manager): update the h2 sql

* fix(resource  manager): solve to delete the tenant

* style(resource  manager): merge the style end delete the unuse s3 config

* fix(resource  manager): extend s3 to the storage of ds

UPDATE the rename resources when s3

* fix(resource  manager): extend s3 to the storage of ds

1.fix the code style of QuartzImpl

* fix(resource  manager): extend s3 to the storage of ds

1.impoort restore_type to CommonUtils

* fix(resource  manager): update the work thread

* fix(resource  manager): update  the baseTaskProcessor

* fix(resource  manager): upgrade dolphinscheduler-standalone-server.xml

* fix(resource  manager): add  user Info to dolphinscheduler_h2.sql

* fix(resource  manager): merge  the resourceType to NONE

* style(upgrade the log level to info):

* fix(resource  manager): sysnc the h2.sql

* fix(resource  manager): update the merge the user tenant

* fix(resource  manager): merge the resourcesServiceImpl

* fix(resource  manager):

when the storage is s3 ,that the directory can't be renamed

* fix(resource  manager): in s3 ,the directory cannot be renamed

* fix(resource  manager): delete the deleteRenameDirectory in E2E

* fix(resource  manager): check the style and  recoverd the test

* fix(resource  manager): delete the log.print(LoginUser)

* fix(server): fix the  udf serialize

* fix(master  task): update the udfTest to update the json string

* fix(test): update the udfFuncTest

* fix(common): syn the common.properties

* fix(udfTest): upgrade the udfTest

* fix(common): revent the common.properties
2022-04-11 10:49:46 +08:00
BaoLiang fea9ce391b
Update inappropriate characters (#9413) 2022-04-10 21:09:48 +08:00
Tq e2759a8f42
[Feature] [ALERT-9406]add new properties to alert class (#9408)
* add new properties to Alert.java and do minor changes to comments

* fix Integer to int

* fix Integer to int

* fix sql files

* fix not null properties to default null
2022-04-08 23:54:34 +08:00
worry dce3c132ca
[DS-9387][refactor]Remove the lock in the start method of the MasterRegistryClient class (#9389) 2022-04-08 21:06:28 +08:00
Jiajie Zhong 2c49c248f3
[doc] Remove observability (#9402)
SkyWalking v9 is coming soon and there are without
DolphinScheduler menus anymore, So we should remove
the SW agent to avoid confusion.

close: #9242
2022-04-08 15:52:57 +08:00
labbomb 2db4cd4f14
[Bug][UI Next]Modify the display state logic of save buttons under workflow definition (#9403)
* Modifying site Configurations

* Modify the display state logic of save buttons under workflow definition
2022-04-08 14:55:49 +08:00
Eric Gao d23616fcbc
[CI] Enable CI to remove unexpected files in /docs/img dir (#9393) 2022-04-08 13:46:35 +08:00
Jiajie Zhong a8f6bf3831
Add new code owner of docs module (#9388) 2022-04-08 13:04:50 +08:00
Jiajie Zhong c828809b46
[doc] Change get help from dev mail list to slack (#9377)
* Change all get help from dev mailing list to slack, because
  we find out mailing list have many users ask for subscribe
  and they maybe subscribe by accident.
* remove join dev mailing list in faq.md because we already
  have it in https://dolphinscheduler.apache.org/en-us/community/development/subscribe.html
2022-04-08 10:56:35 +08:00
songjianet 72bdf6f03c
[Fix][UI Next][V1.0.0-Alpha] Fix the task instance forced success button multi-language support error. (#9392) 2022-04-08 10:02:48 +08:00
guoshupei ca95d2f928
[Fix-9221] [alert-server] optimization and gracefully close (#9246)
* [Fix-9221] [alert-server] optimization and gracefully close

This closes #9221

* [Fix-9221] [alert-server] remove unused mock data

This closes #9221

* [Fix-9221] [alert-server] remove unused mock data

This closes #9221

* [Fix-9221] [alert-server] remove unnecessary Mockito stubbings

* [Fix-9221] [alert-server] init AlertPluginManager in AlertServer

* [Fix-9221] [alert-server] AlertServerTest add AlertPluginManager installPlugin

* [Fix-9221] [alert-server] replace @Eventlistener with @PostConstruct

* [Fix-9221] [alert-server] sonar check solution

* [Improvement-9221] [alert] update constructor injection and replace IStoppable with Closeable

Co-authored-by: guoshupei <guoshupei@lixiang.com>
2022-04-08 10:02:10 +08:00
caishunfeng ab0357027d
[Improvement] change method access (#9390)
* change method to protected

* change method access
2022-04-07 21:51:21 +08:00
Tq 2a4fa9cdb1
[BUG][WORKER-9349]fix param priority (#9379)
* fix param priority

* fix params priority code logic
2022-04-07 20:09:53 +08:00
Tq f186b0d391
[Bug][API-9364]fix ProcessInstance wrong alert group id (#9383)
* fix ProcessInstance wrong alert group id

* change  createComplementCommandList method to protected
2022-04-07 19:19:38 +08:00
Eric Gao fd6b43bc81
[Dev] Switch version in pom.xml to dev-SNAPSHOT (#9223) (#9299) 2022-04-07 18:15:19 +08:00
Amy0104 e2c36324b3
[Fix][UI Next][V1.0.0-Alpha] Add light color theme to echarts. (#9381) 2022-04-07 17:35:17 +08:00
Jiajie Zhong 3457cee960
[python] Migrate pythonGatewayServer into api server (#9372)
Currently the size of our distribute package is up to
800MB, this patch is migrate python gateway server into
api server

The distribute package size before and after this patch is:

```sh
# before
796M   apache-dolphinscheduler-2.0.4-SNAPSHOT-bin.tar.gz

# after
647M   apache-dolphinscheduler-2.0.4-SNAPSHOT-bin.tar.gz
```
2022-04-07 14:41:15 +08:00
mans2singh 950f32e1d6
[task-spark][docs] Corrected notice section (#9375) 2022-04-07 14:03:21 +08:00
gaojun2048 5ef3f9d668
[optimization] [Service] Optimization ProcessService and add ProcessService interface (#9370) 2022-04-07 12:21:34 +08:00
xiangzihao 80ea8049e0
[CI] try to fix ci (#9366)
* try to fix ci

* try to fix ci

* try to fix ci

* try to fix ci

* try to fix ci

* try to fix ci

* try to fix ci

* try to fix ci

* try to fix ci

* try to fix ci

* try to fix ci

* try to fix ci

* try to fix ci
2022-04-06 21:57:52 +08:00
xiangzihao 1679f15a50
[FIX-9355] Fix scheduleTime of start-process-instance api in api-doc (#9359)
* fix #9355

* fix #9355

* fix ut error

* fix ut error
2022-04-06 21:34:22 +08:00
Tq c294979e2f
[Bug-9235][Alert]Fix wechat markdown message and change wechat form structure (#9367)
* fix wechat issues:
1. change table msg type to markdown.
2. change userId to not required and enrich hints
3. change 'app id' to 'app id and chat id'

* fix wechat issues:
1. revert table showtype and add markdown showtype.
2. enrich hints.
3. delete 'chatid', rename agentid to weChatAgentIdChatId.
4. modify code to send markdown message.

* fix wechat issues: Change the language pack of agentId to agentId/chatId.

* fix format

* fix param name

Co-authored-by: Amy <amywang0104@163.com>
2022-04-06 20:29:30 +08:00
worry 2bab12f2c8
[Feature-9204][alert] Implement alert send status (#9208)
* [DS-9204][feat][alert,dao] Implement alert send status
- implement alert send status
- add alert send status entity、mapper
- modify alert dao
- modify alert sender
- add test
- add sql

This closes #9204

* [DS-9204][feat][alert,dao] Implement alert send status
- add license header

This closes #9204
2022-04-06 18:08:00 +08:00
Devosend ce7740b9fc
[Fix][UI Next][V1.0.0-Alpha]Add zh for dag execution policy (#9363) 2022-04-06 16:31:20 +08:00
Devosend aba257084b
[Fix][UI-Next] Rename process to workflow (#9350) 2022-04-06 10:23:30 +08:00
Devosend c21f8c650b
[Fix] [UI-Next][V1.0.0-Alpha] Rename node to task in the task creation modal (#9351) 2022-04-06 10:17:20 +08:00
Tq 4873ec8a45
fix task definition page info total count (#9354) 2022-04-06 09:57:20 +08:00
xiangzihao 40b73f7962
[Improve][CI] improve ci checking (#9325) 2022-04-06 09:33:41 +08:00
zchong 4a29c6a6c8
[Improvement-9338][API] show more create datasource exception message (#9336)
* Update DataSourceServiceImpl.java

fix  error message miss

* Update DataSourceServiceImpl.java

import optional jar
2022-04-06 09:17:29 +08:00
593 changed files with 17226 additions and 7156 deletions

1
.github/CODEOWNERS vendored
View File

@ -20,3 +20,4 @@ dolphinscheduler/dolphinscheduler-e2e @kezhenxu94
dolphinscheduler/dolphinscheduler-registry @kezhenxu94
dolphinscheduler/dolphinscheduler-standalone-server @kezhenxu94
dolphinscheduler/dolphinscheduler-python @zhongjiajie
dolphinscheduler/docs @zhongjiajie @Tianqi-Dotes

View File

@ -22,11 +22,12 @@ labels: [ "bug", "Waiting for reply" ]
body:
- type: markdown
attributes:
value: |
value: >
Please make sure what you are reporting is indeed a bug with reproducible steps, if you want to ask questions
or share ideas, you can head to our
[Discussions](https://github.com/apache/dolphinscheduler/discussions) tab, you can also [subscribe to our mailing list](mailto:dev-subscribe@dolphinscheduler.apache.org) and send
emails to [our mailing list](mailto:dev@dolphinscheduler.apache.org)
[Discussions](https://github.com/apache/dolphinscheduler/discussions) tab, you can also
[join our slack](https://join.slack.com/t/asf-dolphinscheduler/shared_invite/zt-omtdhuio-_JISsxYhiVsltmC5h38yfw)
and send your question to channel `#troubleshooting`
For better global communication, Please write in English.

View File

@ -38,8 +38,23 @@ concurrency:
cancel-in-progress: true
jobs:
paths-filter:
name: Backend-Path-Filter
runs-on: ubuntu-latest
outputs:
not-ignore: ${{ steps.filter.outputs.not-ignore }}
steps:
- uses: actions/checkout@v2
- uses: dorny/paths-filter@b2feaf19c27470162a626bd6fa8438ae5b263721
id: filter
with:
filters: |
not-ignore:
- '!(docs/**)'
build:
name: Build
name: Backend-Build
needs: paths-filter
if: ${{ (needs.paths-filter.outputs.not-ignore == 'true') || (github.event_name == 'push') }}
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
@ -63,3 +78,20 @@ jobs:
-Dmaven.wagon.httpconnectionManager.ttlSeconds=120
- name: Check dependency license
run: tools/dependencies/check-LICENSE.sh
result:
name: Build
runs-on: ubuntu-latest
timeout-minutes: 30
needs: [ build, paths-filter ]
if: always()
steps:
- name: Status
run: |
if [[ ${{ needs.paths-filter.outputs.not-ignore }} == 'false' && ${{ github.event_name }} == 'pull_request' ]]; then
echo "Skip Build!"
exit 0
fi
if [[ ${{ needs.build.result }} != 'success' ]]; then
echo "Build Failed!"
exit -1
fi

View File

@ -18,6 +18,8 @@ name: Docs
on:
pull_request:
paths:
- 'docs/**'
concurrency:
group: doc-${{ github.event.pull_request.number || github.ref }}
@ -47,6 +49,6 @@ jobs:
- uses: actions/checkout@v2
- run: sudo npm install -g markdown-link-check@3.10.0
- run: |
for file in $(find . -name "*.md"); do
for file in $(find ./docs -name "*.md"); do
markdown-link-check -c .dlc.json -q "$file"
done

View File

@ -29,8 +29,23 @@ concurrency:
jobs:
paths-filter:
name: E2E-Path-Filter
runs-on: ubuntu-latest
outputs:
not-ignore: ${{ steps.filter.outputs.not-ignore }}
steps:
- uses: actions/checkout@v2
- uses: dorny/paths-filter@b2feaf19c27470162a626bd6fa8438ae5b263721
id: filter
with:
filters: |
not-ignore:
- '!(docs/**)'
build:
name: E2E-Build
needs: paths-filter
if: ${{ (needs.paths-filter.outputs.not-ignore == 'true') || (github.event_name == 'push') }}
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
@ -137,18 +152,20 @@ jobs:
name: recording-${{ matrix.case.name }}
path: ${{ env.RECORDING_PATH }}
retention-days: 1
result:
name: E2E
runs-on: ubuntu-latest
timeout-minutes: 30
needs: [ e2e ]
needs: [ e2e, paths-filter ]
if: always()
steps:
- name: Status
run: |
if [[ ${{ needs.e2e.result }} == 'success' ]]; then
echo "Passed!"
else
if [[ ${{ needs.paths-filter.outputs.not-ignore }} == 'false' && ${{ github.event_name }} == 'pull_request' ]]; then
echo "Skip E2E!"
exit 0
fi
if [[ ${{ needs.e2e.result }} != 'success' ]]; then
echo "E2E Failed!"
exit -1
fi

View File

@ -58,6 +58,7 @@ jobs:
node-version: 16
- name: Compile and Build
run: |
npm install
npm run lint
npm run build:prod
npm install pnpm -g
pnpm install
pnpm run lint
pnpm run build:prod

View File

@ -40,5 +40,8 @@ jobs:
- name: "Comment in issue"
uses: ./.github/actions/comment-on-issue
with:
message: "Hi:\n* Thank you for your feedback, we have received your issue, Please wait patiently for a reply.\n* In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.\n* If you haven't received a reply for a long time, you can subscribe to the developer's emailMail subscription steps reference https://dolphinscheduler.apache.org/en-us/community/development/subscribe.html ,Then write the issue URL in the email content and send question to dev@dolphinscheduler.apache.org."
message: |
Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
* In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
* If you haven't received a reply for a long time, you can [join our slack](https://join.slack.com/t/asf-dolphinscheduler/shared_invite/zt-omtdhuio-_JISsxYhiVsltmC5h38yfw) and send your question to channel `#troubleshooting`
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

View File

@ -36,8 +36,23 @@ concurrency:
cancel-in-progress: true
jobs:
paths-filter:
name: Unit-Test-Path-Filter
runs-on: ubuntu-latest
outputs:
not-ignore: ${{ steps.filter.outputs.not-ignore }}
steps:
- uses: actions/checkout@v2
- uses: dorny/paths-filter@b2feaf19c27470162a626bd6fa8438ae5b263721
id: filter
with:
filters: |
not-ignore:
- '!(docs/**)'
unit-test:
name: Unit Test
name: Unit-Test
needs: paths-filter
if: ${{ (needs.paths-filter.outputs.not-ignore == 'true') || (github.event_name == 'push') }}
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
@ -96,3 +111,20 @@ jobs:
with:
name: unit-test-logs
path: ${LOG_DIR}
result:
name: Unit Test
runs-on: ubuntu-latest
timeout-minutes: 30
needs: [ unit-test, paths-filter ]
if: always()
steps:
- name: Status
run: |
if [[ ${{ needs.paths-filter.outputs.not-ignore }} == 'false' && ${{ github.event_name }} == 'pull_request' ]]; then
echo "Skip Unit Test!"
exit 0
fi
if [[ ${{ needs.unit-test.result }} != 'success' ]]; then
echo "Unit Test Failed!"
exit -1
fi

View File

@ -86,7 +86,7 @@ We would like to express our deep gratitude to all the open-source projects used
## Get Help
1. Submit an [issue](https://github.com/apache/dolphinscheduler/issues/new/choose)
1. Subscribe to this mailing list: https://dolphinscheduler.apache.org/en-us/community/development/subscribe.html, then email dev@dolphinscheduler.apache.org
2. [Join our slack](https://join.slack.com/t/asf-dolphinscheduler/shared_invite/zt-omtdhuio-_JISsxYhiVsltmC5h38yfw) and send your question to channel `#troubleshooting`
## Community

View File

@ -87,8 +87,8 @@ Dolphin Scheduler使用了很多优秀的开源项目比如google的guava、g
## 获得帮助
1. 提交issue
2. 先订阅邮件开发列表:[订阅邮件列表](https://dolphinscheduler.apache.org/zh-cn/community/development/subscribe.html), 订阅成功后发送邮件到dev@dolphinscheduler.apache.org.
1. 提交 [issue](https://github.com/apache/dolphinscheduler/issues/new/choose)
2. [加入slack群](https://join.slack.com/t/asf-dolphinscheduler/shared_invite/zt-omtdhuio-_JISsxYhiVsltmC5h38yfw) 并在频道 `#troubleshooting` 中提问
## 社区

View File

@ -140,29 +140,6 @@ services:
networks:
- dolphinscheduler
dolphinscheduler-python-gateway:
image: ${HUB}/dolphinscheduler-python:${TAG}
ports:
- "54321:54321"
- "25333:25333"
env_file: .env
healthcheck:
test: [ "CMD", "curl", "http://localhost:54321/actuator/health" ]
interval: 30s
timeout: 5s
retries: 3
depends_on:
dolphinscheduler-schema-initializer:
condition: service_completed_successfully
dolphinscheduler-zookeeper:
condition: service_healthy
volumes:
- dolphinscheduler-logs:/opt/dolphinscheduler/logs
- dolphinscheduler-shared-local:/opt/soft
- dolphinscheduler-resource-local:/dolphinscheduler
networks:
- dolphinscheduler
networks:
dolphinscheduler:
driver: bridge

View File

@ -118,27 +118,6 @@ services:
mode: replicated
replicas: 1
dolphinscheduler-python-gateway:
image: apache/dolphinscheduler-python-gateway
ports:
- 54321:54321
- 25333:25333
env_file: .env
healthcheck:
test: [ "CMD", "curl", "http://localhost:54321/actuator/health" ]
interval: 30s
timeout: 5s
retries: 3
volumes:
- dolphinscheduler-logs:/opt/dolphinscheduler/logs
- dolphinscheduler-shared-local:/opt/soft
- dolphinscheduler-resource-local:/dolphinscheduler
networks:
- dolphinscheduler
deploy:
mode: replicated
replicas: 1
networks:
dolphinscheduler:
driver: overlay

View File

@ -39,7 +39,7 @@ version: 2.0.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application.
appVersion: 2.0.4-SNAPSHOT
appVersion: dev-SNAPSHOT
dependencies:
- name: postgresql

View File

@ -44,9 +44,6 @@ Create default docker images' fullname.
{{- define "dolphinscheduler.image.fullname.tools" -}}
{{- .Values.image.registry }}/dolphinscheduler-tools:{{ .Values.image.tag | default .Chart.AppVersion -}}
{{- end -}}
{{- define "dolphinscheduler.image.fullname.python-gateway" -}}
{{- .Values.image.registry }}/dolphinscheduler-python-gateway:{{ .Values.image.tag | default .Chart.AppVersion -}}
{{- end -}}
{{/*
Create a default common labels.

View File

@ -28,6 +28,10 @@ metadata:
labels:
app.kubernetes.io/name: {{ include "dolphinscheduler.fullname" . }}
{{- include "dolphinscheduler.common.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
rules:
- host: {{ .Values.ingress.host }}

View File

@ -23,7 +23,7 @@ timezone: "Asia/Shanghai"
image:
registry: "dolphinscheduler.docker.scarf.sh/apache"
tag: "2.0.4-SNAPSHOT"
tag: "dev-SNAPSHOT"
pullPolicy: "IfNotPresent"
pullSecret: ""
@ -403,6 +403,7 @@ ingress:
enabled: false
host: "dolphinscheduler.org"
path: "/dolphinscheduler"
annotations: {}
tls:
enabled: false
secretName: "dolphinscheduler-tls"

View File

@ -97,6 +97,10 @@ export default {
title: 'Task Instance',
link: '/en-us/docs/dev/user_doc/guide/project/task-instance.html',
},
{
title: 'Task Definition',
link: '/zh-cn/docs/dev/user_doc/guide/project/task-definition.html',
},
]
},
{
@ -239,6 +243,10 @@ export default {
},
],
},
{
title: 'Data Quality',
link: '/en-us/docs/dev/user_doc/guide/data-quality.html',
},
{
title: 'Resource',
link: '/en-us/docs/dev/user_doc/guide/resource.html',
@ -251,6 +259,15 @@ export default {
title: 'Security',
link: '/en-us/docs/dev/user_doc/guide/security.html',
},
{
title: 'How-To',
children: [
{
title: 'General Setting',
link: '/en-us/docs/dev/user_doc/guide/howto/general-setting.html',
}
],
},
{
title: 'Open API',
link: '/en-us/docs/dev/user_doc/guide/open-api.html',
@ -298,16 +315,6 @@ export default {
},
],
},
{
title: 'Observability',
children: [
{
title: 'SkyWalking-Agent',
link: '/en-us/docs/dev/user_doc/guide/installation/skywalking-agent.html',
},
],
},
{
title: 'FAQ',
children: [
@ -408,6 +415,10 @@ export default {
title: '',
link: '/zh-cn/docs/dev/user_doc/guide/project/task-instance.html',
},
{
title: '',
link: '/zh-cn/docs/dev/user_doc/guide/project/task-definition.html',
},
]
},
{
@ -550,6 +561,10 @@ export default {
},
],
},
{
title: '',
link: '/zh-cn/docs/dev/user_doc/guide /data-quality.html',
},
{
title: '',
link: '/zh-cn/docs/dev/user_doc/guide/resource.html',
@ -562,6 +577,15 @@ export default {
title: '',
link: '/zh-cn/docs/dev/user_doc/guide/security.html',
},
{
title: '',
children: [
{
title: '',
link: '/zh-cn/docs/dev/user_doc/guide/howto/general-setting.html',
}
],
},
{
title: 'API',
link: '/zh-cn/docs/dev/user_doc/guide/open-api.html',
@ -609,15 +633,6 @@ export default {
},
],
},
{
title: '',
children: [
{
title: 'SkyWalking-Agent',
link: '/zh-cn/docs/dev/user_doc/guide/installation/skywalking-agent.html',
},
],
},
{
title: 'FAQ',
children: [

View File

@ -42,6 +42,7 @@ import docs201Config from '../../../site_config/docs2-0-1';
import docs202Config from '../../../site_config/docs2-0-2';
import docs203Config from '../../../site_config/docs2-0-3';
import docs205Config from '../../../site_config/docs2-0-5';
import docs300Config from '../../../site_config/docs3-0-0';
import docsDevConfig from '../../../site_config/docsdev';
const docsSource = {
@ -60,6 +61,7 @@ const docsSource = {
'2.0.2': docs202Config,
'2.0.3': docs203Config,
'2.0.5': docs205Config,
'3.0.0': docs300Config,
dev: docsDevConfig,
};

View File

@ -24,7 +24,7 @@ export default {
port: 8080,
domain: 'dolphinscheduler.apache.org',
copyToDist: ['asset', 'img', 'file', '.asf.yaml', 'sitemap.xml', '.nojekyll', '.htaccess', 'googled0df7b96f277a143.html'],
docsLatest: '2.0.5',
docsLatest: '3.0.0',
defaultSearch: 'google', // default search engine
defaultLanguage: 'en-us',
'en-us': {
@ -41,12 +41,12 @@ export default {
{
key: 'docs',
text: 'DOCS',
link: '/en-us/docs/latest/user_doc/guide/quick-start.html',
link: '/en-us/docs/latest/user_doc/about/introduction.html',
children: [
{
key: 'docs0',
text: 'latest(2.0.5)',
link: '/en-us/docs/latest/user_doc/guide/quick-start.html',
text: 'latest(3.0.0-alpha)',
link: '/en-us/docs/latest/user_doc/about/introduction.html',
},
{
key: 'docs1',
@ -227,12 +227,12 @@ export default {
{
key: 'docs',
text: '',
link: '/zh-cn/docs/latest/user_doc/guide/quick-start.html',
link: '/zh-cn/docs/latest/user_doc/about/introduction.html',
children: [
{
key: 'docs0',
text: 'latest(2.0.5)',
link: '/zh-cn/docs/latest/user_doc/guide/quick-start.html',
text: 'latest(3.0.0-alpha)',
link: '/zh-cn/docs/latest/user_doc/about/introduction.html',
},
{
key: 'docs1',

View File

@ -397,21 +397,41 @@ apiServers="ds1"
### dolphinscheduler_env.sh [load environment variables configs]
When using shell to commit tasks, DS will load environment variables inside dolphinscheduler_env.sh into the host.
Types of tasks involved are: Shell, Python, Spark, Flink, DataX, etc.
When using shell to commit tasks, DolphinScheduler will export environment variables from `bin/env/dolphinscheduler_env.sh`. The
mainly configuration including `JAVA_HOME`, mata database, registry center, and task configuration.
```bash
export HADOOP_HOME=/opt/soft/hadoop
export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
export SPARK_HOME1=/opt/soft/spark1
export SPARK_HOME2=/opt/soft/spark2
export PYTHON_HOME=/opt/soft/python
export JAVA_HOME=/opt/soft/java
export HIVE_HOME=/opt/soft/hive
export FLINK_HOME=/opt/soft/flink
export DATAX_HOME=/opt/soft/datax/bin/datax.py
# JAVA_HOME, will use it to start DolphinScheduler server
export JAVA_HOME=${JAVA_HOME:-/opt/soft/java}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$DATAX_HOME:$PATH
# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-postgresql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_DRIVER_CLASS_NAME
export SPRING_DATASOURCE_URL
export SPRING_DATASOURCE_USERNAME
export SPRING_DATASOURCE_PASSWORD
# DolphinScheduler server related configuration
export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}
# Registry center configuration, determines the type and link of the registry center
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-localhost:2181}
# Tasks related configurations, need to change the configuration if you use the related tasks.
export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH
```
### Services logback configs

View File

@ -0,0 +1,100 @@
# API design standard
A standardized and unified API is the cornerstone of project design.The API of DolphinScheduler follows the REST ful standard. REST ful is currently the most popular Internet software architecture. It has a clear structure, conforms to standards, is easy to understand and extend.
This article uses the DolphinScheduler API as an example to explain how to construct a Restful API.
## 1. URI design
REST is "Representational State Transfer".The design of Restful URI is based on resources.The resource corresponds to an entity on the network, for example: a piece of text, a picture, and a service. And each resource corresponds to a URI.
+ One Kind of Resource: expressed in the plural, such as `task-instances`、`groups` ;
+ A Resource: expressed in the singular, or use the ID to represent the corresponding resource, such as `group`、`groups/{groupId}`;
+ Sub Resources: Resources under a certain resource, such as `/instances/{instanceId}/tasks`;
+ A Sub Resource`/instances/{instanceId}/tasks/{taskId}`;
## 2. Method design
We need to locate a certain resource by URI, and then use Method or declare actions in the path suffix to reflect the operation of the resource.
### ① Query - GET
Use URI to locate the resource, and use GET to indicate query.
+ When the URI is a type of resource, it means to query a type of resource. For example, the following example indicates paging query `alter-groups`.
```
Method: GET
/api/dolphinscheduler/alert-groups
```
+ When the URI is a single resource, it means to query this resource. For example, the following example means to query the specified `alter-group`.
```
Method: GET
/api/dolphinscheduler/alter-groups/{id}
```
+ In addition, we can also express query sub-resources based on URI, as follows:
```
Method: GET
/api/dolphinscheduler/projects/{projectId}/tasks
```
**The above examples all represent paging query. If we need to query all data, we need to add `/list` after the URI to distinguish. Do not mix the same API for both paged query and query.**
```
Method: GET
/api/dolphinscheduler/alert-groups/list
```
### ② Create - POST
Use URI to locate the resource, use POST to indicate create, and then return the created id to requester.
+ create an `alter-group`
```
Method: POST
/api/dolphinscheduler/alter-groups
```
+ create sub-resources is also the same as above.
```
Method: POST
/api/dolphinscheduler/alter-groups/{alterGroupId}/tasks
```
### ③ Modify - PUT
Use URI to locate the resource, use PUT to indicate modify.
+ modify an `alert-group`
```
Method: PUT
/api/dolphinscheduler/alter-groups/{alterGroupId}
```
### ④ Delete -DELETE
Use URI to locate the resource, use DELETE to indicate delete.
+ delete an `alert-group`
```
Method: DELETE
/api/dolphinscheduler/alter-groups/{alterGroupId}
```
+ batch deletion: batch delete the id arraywe should use POST. **Do not use the DELETE method, because the body of the DELETE request has no semantic meaning, and it is possible that some gateways, proxies, and firewalls will directly strip off the request body after receiving the DELETE request.**
```
Method: POST
/api/dolphinscheduler/alter-groups/batch-delete
```
### ⑤ Others
In addition to creating, deleting, modifying and quering, we also locate the corresponding resource through url, and then append operations to it after the path, such as:
```
/api/dolphinscheduler/alert-groups/verify-name
/api/dolphinscheduler/projects/{projectCode}/process-instances/{code}/view-gantt
```
## 3. Parameter design
There are two types of parameters, one is request parameter and the other is path parameter. And the parameter must use small hump.
In the case of paging, if the parameter entered by the user is less than 1, the front end needs to automatically turn to 1, indicating that the first page is requested; When the backend finds that the parameter entered by the user is greater than the total number of pages, it should directly return to the last page.
## 4. Others design
### base URL
The URI of the project needs to use `/api/<project_name>` as the base path, so as to identify that these APIs are under this project.
```
/api/dolphinscheduler
```

View File

@ -0,0 +1,315 @@
## Architecture Design
Before explaining the architecture of the schedule system, let us first understand the common nouns of the schedule system.
### 1.Noun Interpretation
**DAG** Full name Directed Acyclic Graphreferred to as DAG。Tasks in the workflow are assembled in the form of directed acyclic graphs, which are topologically traversed from nodes with zero indegrees of ingress until there are no successor nodes. For example, the following picture:
<p align="center">
<img src="/img/architecture-design/dag_examples.png" alt="dag示例" width="80%" />
<p align="center">
<em>dag example</em>
</p>
</p>
**Process definition**: Visualization **DAG** by dragging task nodes and establishing associations of task nodes
**Process instance**: A process instance is an instantiation of a process definition, which can be generated by manual startup or scheduling. The process definition runs once, a new process instance is generated
**Task instance**: A task instance is the instantiation of a specific task node when a process instance runs, which indicates the specific task execution status
**Task type**: Currently supports SHELL, SQL, SUB_PROCESS (sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT (dependency), and plans to support dynamic plug-in extension, note: the sub-**SUB_PROCESS** is also A separate process definition that can be launched separately
**Schedule mode** : The system supports timing schedule and manual schedule based on cron expressions. Command type support: start workflow, start execution from current node, resume fault-tolerant workflow, resume pause process, start execution from failed node, complement, timer, rerun, pause, stop, resume waiting thread. Where **recovers the fault-tolerant workflow** and **restores the waiting thread** The two command types are used by the scheduling internal control and cannot be called externally
**Timed schedule**: The system uses **quartz** distributed scheduler and supports the generation of cron expression visualization
**Dependency**: The system does not only support **DAG** Simple dependencies between predecessors and successor nodes, but also provides **task dependencies** nodes, support for **custom task dependencies between processes**
**Priority**: Supports the priority of process instances and task instances. If the process instance and task instance priority are not set, the default is first in, first out.
**Mail Alert**: Support **SQL Task** Query Result Email Send, Process Instance Run Result Email Alert and Fault Tolerant Alert Notification
**Failure policy**: For tasks running in parallel, if there are tasks that fail, two failure policy processing methods are provided. **Continue** means that the status of the task is run in parallel until the end of the process failure. **End** means that once a failed task is found, Kill also drops the running parallel task and the process ends.
**Complement**: Complement historical data, support **interval parallel and serial** two complement methods
### 2.System architecture
#### 2.1 System Architecture Diagram
<p align="center">
<img src="/img/architecture.jpg" alt="System Architecture Diagram" />
<p align="center">
<em>System Architecture Diagram</em>
</p>
</p>
#### 2.2 Architectural description
* **MasterServer**
MasterServer adopts the distributed non-central design concept. MasterServer is mainly responsible for DAG task split, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer.
When the MasterServer service starts, it registers a temporary node with Zookeeper, and listens to the Zookeeper temporary node state change for fault tolerance processing.
##### The service mainly contains:
- **Distributed Quartz** distributed scheduling component, mainly responsible for the start and stop operation of the scheduled task. When the quartz picks up the task, the master internally has a thread pool to be responsible for the subsequent operations of the task.
- **MasterSchedulerThread** is a scan thread that periodically scans the **command** table in the database for different business operations based on different **command types**
- **MasterExecThread** is mainly responsible for DAG task segmentation, task submission monitoring, logic processing of various command types
- **MasterTaskExecThread** is mainly responsible for task persistence
* **WorkerServer**
- WorkerServer also adopts a distributed, non-central design concept. WorkerServer is mainly responsible for task execution and providing log services. When the WorkerServer service starts, it registers the temporary node with Zookeeper and maintains the heartbeat.
##### This service contains:
- **FetchTaskThread** is mainly responsible for continuously receiving tasks from **Task Queue** and calling **TaskScheduleThread** corresponding executors according to different task types.
- **ZooKeeper**
The ZooKeeper service, the MasterServer and the WorkerServer nodes in the system all use the ZooKeeper for cluster management and fault tolerance. In addition, the system also performs event monitoring and distributed locking based on ZooKeeper.
We have also implemented queues based on Redis, but we hope that DolphinScheduler relies on as few components as possible, so we finally removed the Redis implementation.
- **Task Queue**
The task queue operation is provided. Currently, the queue is also implemented based on Zookeeper. Since there is less information stored in the queue, there is no need to worry about too much data in the queue. In fact, we have over-measured a million-level data storage queue, which has no effect on system stability and performance.
- **Alert**
Provides alarm-related interfaces. The interfaces mainly include **Alarms**. The storage, query, and notification functions of the two types of alarm data. The notification function has two types: **mail notification** and **SNMP (not yet implemented)**.
- **API**
The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service provides a RESTful api to provide request services externally.
Interfaces include workflow creation, definition, query, modification, release, offline, manual start, stop, pause, resume, start execution from this node, and more.
- **UI**
The front-end page of the system provides various visual operation interfaces of the system. For details, see the [quick start](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/about/introduction.html) section.
#### 2.3 Architectural Design Ideas
##### I. Decentralized vs centralization
###### Centralization Thought
The centralized design concept is relatively simple. The nodes in the distributed cluster are divided into two roles according to their roles:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/master_slave.png" alt="master-slave role" width="50%" />
</p>
- The role of Master is mainly responsible for task distribution and supervising the health status of Slave. It can dynamically balance the task to Slave, so that the Slave node will not be "busy" or "free".
- The role of the Worker is mainly responsible for the execution of the task and maintains the heartbeat with the Master so that the Master can assign tasks to the Slave.
Problems in the design of centralized :
- Once the Master has a problem, the group has no leader and the entire cluster will crash. In order to solve this problem, most Master/Slave architecture modes adopt the design scheme of the master and backup masters, which can be hot standby or cold standby, automatic switching or manual switching, and more and more new systems are available. Automatically elects the ability to switch masters to improve system availability.
- Another problem is that if the Scheduler is on the Master, although it can support different tasks in one DAG running on different machines, it will generate overload of the Master. If the Scheduler is on the Slave, all tasks in a DAG can only be submitted on one machine. If there are more parallel tasks, the pressure on the Slave may be larger.
###### Decentralization
<p align="center"
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/decentralization.png" alt="decentralized" width="50%" />
</p>
- In the decentralized design, there is usually no Master/Slave concept, all roles are the same, the status is equal, the global Internet is a typical decentralized distributed system, networked arbitrary node equipment down machine , all will only affect a small range of features.
- The core design of decentralized design is that there is no "manager" that is different from other nodes in the entire distributed system, so there is no single point of failure problem. However, since there is no "manager" node, each node needs to communicate with other nodes to get the necessary machine information, and the unreliable line of distributed system communication greatly increases the difficulty of implementing the above functions.
- In fact, truly decentralized distributed systems are rare. Instead, dynamic centralized distributed systems are constantly emerging. Under this architecture, the managers in the cluster are dynamically selected, rather than preset, and when the cluster fails, the nodes of the cluster will spontaneously hold "meetings" to elect new "managers". Go to preside over the work. The most typical case is the Etcd implemented in ZooKeeper and Go.
- Decentralization of DolphinScheduler is the registration of Master/Worker to ZooKeeper. The Master Cluster and the Worker Cluster are not centered, and the Zookeeper distributed lock is used to elect one Master or Worker as the “manager” to perform the task.
##### 二、Distributed lock practice
DolphinScheduler uses ZooKeeper distributed locks to implement only one Master to execute the Scheduler at the same time, or only one Worker to perform task submission.
1. The core process algorithm for obtaining distributed locks is as follows
<p align="center">
<img src="/img/architecture-design/distributed_lock.png" alt="Get Distributed Lock Process" width="70%" />
</p>
2. Scheduler thread distributed lock implementation flow chart in DolphinScheduler:
<p align="center">
<img src="/img/architecture-design/distributed_lock_procss.png" alt="Get Distributed Lock Process" />
</p>
##### Third, the thread is insufficient loop waiting problem
- If there is no subprocess in a DAG, if the number of data in the Command is greater than the threshold set by the thread pool, the direct process waits or fails.
- If a large number of sub-processes are nested in a large DAG, the following figure will result in a "dead" state:
<p align="center">
<img src="/img/architecture-design/lack_thread.png" alt="Thread is not enough to wait for loop" width="70%" />
</p>
In the above figure, MainFlowThread waits for SubFlowThread1 to end, SubFlowThread1 waits for SubFlowThread2 to end, SubFlowThread2 waits for SubFlowThread3 to end, and SubFlowThread3 waits for a new thread in the thread pool, then the entire DAG process cannot end, and thus the thread cannot be released. This forms the state of the child parent process loop waiting. At this point, the scheduling cluster will no longer be available unless a new Master is started to add threads to break such a "stuck."
It seems a bit unsatisfactory to start a new Master to break the deadlock, so we proposed the following three options to reduce this risk:
1. Calculate the sum of the threads of all Masters, and then calculate the number of threads required for each DAG, that is, pre-calculate before the DAG process is executed. Because it is a multi-master thread pool, the total number of threads is unlikely to be obtained in real time.
2. Judge the single master thread pool. If the thread pool is full, let the thread fail directly.
3. Add a Command type with insufficient resources. If the thread pool is insufficient, the main process will be suspended. This way, the thread pool has a new thread, which can make the process with insufficient resources hang up and wake up again.
Note: The Master Scheduler thread is FIFO-enabled when it gets the Command.
So we chose the third way to solve the problem of insufficient threads.
##### IV. Fault Tolerant Design
Fault tolerance is divided into service fault tolerance and task retry. Service fault tolerance is divided into two types: Master Fault Tolerance and Worker Fault Tolerance.
###### 1. Downtime fault tolerance
Service fault tolerance design relies on ZooKeeper's Watcher mechanism. The implementation principle is as follows:
<p align="center">
<img src="/img/architecture-design/fault-tolerant.png" alt="DolphinScheduler Fault Tolerant Design" width="70%" />
</p>
The Master monitors the directories of other Masters and Workers. If the remove event is detected, the process instance is fault-tolerant or the task instance is fault-tolerant according to the specific business logic.
- Master fault tolerance flow chart:
<p align="center">
<img src="/img/architecture-design/fault-tolerant_master.png" alt="Master Fault Tolerance Flowchart" width="70%" />
</p>
After the ZooKeeper Master is fault-tolerant, it is rescheduled by the Scheduler thread in DolphinScheduler. It traverses the DAG to find the "Running" and "Submit Successful" tasks, and monitors the status of its task instance for the "Running" task. You need to determine whether the Task Queue already exists. If it exists, monitor the status of the task instance. If it does not exist, resubmit the task instance.
- Worker fault tolerance flow chart:
<p align="center">
<img src="/img/architecture-design/fault-tolerant_worker.png" alt="Worker Fault Tolerance Flowchart" width="70%" />
</p>
Once the Master Scheduler thread finds the task instance as "need to be fault tolerant", it takes over the task and resubmits.
Note: Because the "network jitter" may cause the node to lose the heartbeat of ZooKeeper in a short time, the node's remove event occurs. In this case, we use the easiest way, that is, once the node has timeout connection with ZooKeeper, it will directly stop the Master or Worker service.
###### 2. Task failure retry
Here we must first distinguish between the concept of task failure retry, process failure recovery, and process failure rerun:
- Task failure Retry is task level, which is automatically performed by the scheduling system. For example, if a shell task sets the number of retries to 3 times, then the shell task will try to run up to 3 times after failing to run.
- Process failure recovery is process level, is done manually, recovery can only be performed **from the failed node** or **from the current node**
- Process failure rerun is also process level, is done manually, rerun is from the start node
Next, let's talk about the topic, we divided the task nodes in the workflow into two types.
- One is a business node, which corresponds to an actual script or processing statement, such as a Shell node, an MR node, a Spark node, a dependent node, and so on.
- There is also a logical node, which does not do the actual script or statement processing, but the logical processing of the entire process flow, such as sub-flow sections.
Each **service node** can configure the number of failed retries. When the task node fails, it will automatically retry until it succeeds or exceeds the configured number of retries. **Logical node** does not support failed retry. But the tasks in the logical nodes support retry.
If there is a task failure in the workflow that reaches the maximum number of retries, the workflow will fail to stop, and the failed workflow can be manually rerun or process resumed.
##### V. Task priority design
In the early scheduling design, if there is no priority design and fair scheduling design, it will encounter the situation that the task submitted first may be completed simultaneously with the task submitted subsequently, but the priority of the process or task cannot be set. We have redesigned this, and we are currently designing it as follows:
- According to **different process instance priority** prioritizes **same process instance priority** prioritizes **task priority within the same process** takes precedence over **same process** commit order from high Go to low for task processing.
- The specific implementation is to resolve the priority according to the json of the task instance, and then save the **process instance priority _ process instance id_task priority _ task id** information in the ZooKeeper task queue, when obtained from the task queue, Through string comparison, you can get the task that needs to be executed first.
- The priority of the process definition is that some processes need to be processed before other processes. This can be configured at the start of the process or at the time of scheduled start. There are 5 levels, followed by HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
<p align="center">
<img src="/img/architecture-design/process_priority.png" alt="Process Priority Configuration" width="40%" />
</p>
- The priority of the task is also divided into 5 levels, followed by HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
<p align="center">`
<img src="/img/architecture-design/task_priority.png" alt="task priority configuration" width="35%" />
</p>
##### VI. Logback and gRPC implement log access
- Since the Web (UI) and Worker are not necessarily on the same machine, viewing the log is not as it is for querying local files. There are two options:
- Put the logs on the ES search engine
- Obtain remote log information through gRPC communication
- Considering the lightweightness of DolphinScheduler as much as possible, gRPC was chosen to implement remote access log information.
<p align="center">
<img src="/img/architecture-design/grpc.png" alt="grpc remote access" width="50%" />
</p>
- We use a custom Logback FileAppender and Filter function to generate a log file for each task instance.
- The main implementation of FileAppender is as follows:
```java
/**
* task log appender
*/
Public class TaskLogAppender extends FileAppender<ILoggingEvent> {
...
@Override
Protected void append(ILoggingEvent event) {
If (currentlyActiveFile == null){
currentlyActiveFile = getFile();
}
String activeFile = currentlyActiveFile;
// thread name: taskThreadName-processDefineId_processInstanceId_taskInstanceId
String threadName = event.getThreadName();
String[] threadNameArr = threadName.split("-");
// logId = processDefineId_processInstanceId_taskInstanceId
String logId = threadNameArr[1];
...
super.subAppend(event);
}
}
```
Generate a log in the form of /process definition id/process instance id/task instance id.log
- Filter matches the thread name starting with TaskLogInfo:
- TaskLogFilter is implemented as follows:
```java
/**
* task log filter
*/
Public class TaskLogFilter extends Filter<ILoggingEvent> {
@Override
Public FilterReply decide(ILoggingEvent event) {
If (event.getThreadName().startsWith("TaskLogInfo-")){
Return FilterReply.ACCEPT;
}
Return FilterReply.DENY;
}
}
```
### summary
Starting from the scheduling, this paper introduces the architecture principle and implementation ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued

View File

@ -0,0 +1,61 @@
# Global Parameter development document
After the user defines the parameter with the direction OUT, it is saved in the localParam of the task.
## Usage of parameters
Getting the direct predecessor node `preTasks` of the current `taskInstance` to be created from the DAG, get the `varPool` of `preTasks`, merge this varPool (List) into one `varPool`, and in the merging process, if parameters with the same parameter name are found, they will be handled according to the following logics:
* If all the values are null, the merged value is null
* If one and only one value is non-null, then the merged value is the non-null value
* If all the values are not null, it would be the earliest value of the endtime of taskInstance taken by VarPool.
The direction of all the merged properties is updated to IN during the merge process.
The result of the merge is saved in taskInstance.varPool.
The worker receives and parses the varPool into the format of `Map<String,Property>`, where the key of the map is property.prop, which is the parameter name.
When the processor processes the parameters, it will merge the varPool and localParam and globalParam parameters, and if there are parameters with duplicate names during the merging process, they will be replaced according to the following priorities, with the higher priority being retained and the lower priority being replaced:
* globalParam: high
* varPool: middle
* localParam: low
The parameters are replaced with the corresponding values using regular expressions compared to ${parameter name} before the node content is executed.
## Parameter setting
Currently, only SQL and SHELL nodes are supported to get parameters.
Get the parameter with direction OUT from localParam, and do the following way according to the type of different nodes.
### SQL node
The structure returned by the parameter is List<Map<String,String>>, where the elements of List are each row of data, the key of Map is the column name, and the value is the value corresponding to the column.
* If the SQL statement returns one row of data, match the OUT parameter name based on the OUT parameter name defined by the user when defining the task, or discard it if it does not match.
* If the SQL statement returns multiple rows of data, the column names are matched based on the OUT parameter names defined by the user when defining the task of type LIST. All rows of the corresponding column are converted to `List<String>` as the value of this parameter. If there is no match, it is discarded.
### SHELL node
The result of the processor execution is returned as `Map<String,String>`.
The user needs to define `${setValue(key=value)}` in the output when defining the shell script.
Remove `${setValue()}` when processing parameters, split by "=", with the 0th being the key and the 1st being the value.
Similarly match the OUT parameter name and key defined by the user when defining the task, and use value as the value of that parameter.
Return parameter processing
* The result of acquired Processor is String.
* Determine whether the processor is empty or not, and exit if it is empty.
* Determine whether the localParam is empty or not, and exit if it is empty.
* Get the parameter of localParam which is OUT, and exit if it is empty.
* Format String as per appeal format (`List<Map<String,String>>` for SQL, `Map<String,String>>` for shell).
Assign the parameters with matching values to varPool (List, which contains the original IN's parameters)
* Format the varPool as json and pass it to master.
* The parameters that are OUT would be written into the localParam after the master has received the varPool.

View File

@ -0,0 +1,6 @@
# Overview
<!-- TODO Since the side menu does not support multiple levels, add new page to keep all sub page here -->
* [Global Parameter](global-parameter.md)
* [Switch Task type](task/switch.md)

View File

@ -0,0 +1,8 @@
# SWITCH Task development
Switch task workflow step as follows
* User-defined expressions and branch information are stored in `taskParams` in `taskdefinition`. When the switch is executed, it will be formatted as `SwitchParameters`
* `SwitchTaskExecThread` processes the expressions defined in `switch` from top to bottom, obtains the value of the variable from `varPool`, and parses the expression through `javascript`. If the expression returns true, stop checking and record The order of the expression, here we record as resultConditionLocation. The task of SwitchTaskExecThread is over
* After the `switch` task runs, if there is no error (more commonly, the user-defined expression is out of specification or there is a problem with the parameter name), then `MasterExecThread.submitPostNode` will obtain the downstream node of the `DAG` to continue execution.
* If it is found in `DagHelper.parsePostNodes` that the current node (the node that has just completed the work) is a `switch` node, the `resultConditionLocation` will be obtained, and all branches except `resultConditionLocation` in the SwitchParameters will be skipped. In this way, only the branches that need to be executed are left

View File

@ -0,0 +1,101 @@
### DolphinScheduler Alert SPI main design
#### DolphinScheduler SPI Design
DolphinScheduler is undergoing a microkernel + plug-in architecture change. All core capabilities such as tasks, resource storage, registration centers, etc. will be designed as extension points. We hope to use SPI to improve DolphinSchedulers own flexibility and friendliness (extended sex).
For alarm-related codes, please refer to the `dolphinscheduler-alert-api` module. This module defines the extension interface of the alarm plug-in and some basic codes. When we need to realize the plug-inization of related functions, it is recommended to read the code of this block first. Of course, it is recommended that you read the document. This will reduce a lot of time, but the document There is a certain degree of lag. When the document is missing, it is recommended to take the source code as the standard (if you are interested, we also welcome you to submit related documents). In addition, we will hardly make changes to the extended interface (excluding new additions) , Unless there is a major structural adjustment, there is an incompatible upgrade version, so the existing documents can generally be satisfied.
We use the native JAVA-SPI, when you need to extend, in fact, you only need to pay attention to the extension of the `org.apache.dolphinscheduler.alert.api.AlertChannelFactory` interface, the underlying logic such as plug-in loading, and other kernels have been implemented, Which makes our development more focused and simple.
By the way, we have adopted an excellent front-end component form-create, which supports the generation of front-end UI components based on JSON. If plug-in development involves the front-end, we will use JSON to generate related front-end UI components, org.apache.dolphinscheduler. The parameters of the plug-in are encapsulated in spi.params, which will convert all the relevant parameters into the corresponding JSON, which means that you can complete the drawing of the front-end components by way of Java code (here is mainly the form, we only care Data exchanged between the front and back ends).
This article mainly focuses on the design and development of Alert.
#### Main Modules
If you don't care about its internal design, but simply want to know how to develop your own alarm plug-in, you can skip this content.
* dolphinscheduler-alert-api
This module is the core module of ALERT SPI. This module defines the interface of the alarm plug-in extension and some basic codes. The extension plug-in must implement the interface defined by this module: `org.apache.dolphinscheduler.alert.api.AlertChannelFactory`
* dolphinscheduler-alert-plugins
This module is currently a plug-in provided by us, and now we have supported dozens of plug-ins, such as Email, DingTalk, Script, etc.
#### Alert SPI Main class information.
AlertChannelFactory
Alarm plug-in factory interface. All alarm plug-ins need to implement this interface. This interface is used to define the name of the alarm plug-in and the required parameters. The create method is used to create a specific alarm plug-in instance.
AlertChannel
The interface of the alert plug-in. The alert plug-in needs to implement this interface. There is only one method process in this interface. The upper-level alert system will call this method and obtain the return information of the alert through the AlertResult returned by this method.
AlertData
Alarm content information, including id, title, content, log.
AlertInfo
For alarm-related information, when the upper-level system calls an instance of the alarm plug-in, the instance of this class is passed to the specific alarm plug-in through the process method. It contains the alert content AlertData and the parameter information filled in by the front end of the called alert plug-in instance.
AlertResult
The alarm plug-in sends alarm return information.
org.apache.dolphinscheduler.spi.params
This package is a plug-in parameter definition. Our front-end uses the from-create front-end library http://www.form-create.com, which can dynamically generate the front-end UI based on the parameter list json returned by the plug-in definition, so We don't need to care about the front end when we are doing SPI plug-in development.
Under this package, we currently only encapsulate RadioParam, TextParam, and PasswordParam, which are used to define text type parameters, radio parameters and password type parameters, respectively.
AbsPluginParams This class is the base class of all parameters, RadioParam these classes all inherit this class. Each DS alert plug-in will return a list of AbsPluginParams in the implementation of AlertChannelFactory.
The specific design of alert_spi can be seen in the issue: [Alert Plugin Design](https://github.com/apache/incubator-dolphinscheduler/issues/3049)
#### Alert SPI built-in implementation
* Email
Email alert notification
* DingTalk
Alert for DingTalk group chat bots
Related parameter configuration can refer to the DingTalk robot document.
* EnterpriseWeChat
EnterpriseWeChat alert notifications
Related parameter configuration can refer to the EnterpriseWeChat robot document.
* Script
We have implemented a shell script for alerting. We will pass the relevant alert parameters to the script and you can implement your alert logic in the shell. This is a good way to interface with internal alerting applications.
* SMS
SMS alerts
* FeiShu
FeiShu alert notification
* Slack
Slack alert notification
* PagerDuty
PagerDuty alert notification
* WebexTeams
WebexTeams alert notification
Related parameter configuration can refer to the WebexTeams document.
* Telegram
Telegram alert notification
Related parameter configuration can refer to the Telegram document.
* Http
We have implemented a Http script for alerting. And calling most of the alerting plug-ins end up being Http requests, if we not support your alert plug-in yet, you can use Http to realize your alert login. Also welcome to contribute your common plug-ins to the community :)

View File

@ -0,0 +1,23 @@
## DolphinScheduler Datasource SPI main design
#### How do I use data sources?
The data source center supports POSTGRESQL, HIVE/IMPALA, SPARK, CLICKHOUSE, SQLSERVER data sources by default.
If you are using MySQL or ORACLE data source, you need to place the corresponding driver package in the lib directory
#### How to do Datasource plugin development?
org.apache.dolphinscheduler.spi.datasource.DataSourceChannel
org.apache.dolphinscheduler.spi.datasource.DataSourceChannelFactory
org.apache.dolphinscheduler.plugin.datasource.api.client.CommonDataSourceClient
1. In the first step, the data source plug-in can implement the above interfaces and inherit the general client. For details, refer to the implementation of data source plug-ins such as sqlserver and mysql. The addition methods of all RDBMS plug-ins are the same.
2. Add the driver configuration in the data source plug-in pom.xml
We provide APIs for external access of all data sources in the dolphin scheduler data source API module
#### **Future plan**
Support data sources such as kafka, http, files, sparkSQL, FlinkSQL, etc.

View File

@ -0,0 +1,27 @@
### DolphinScheduler Registry SPI Extension
#### how to use?
Make the following configuration (take zookeeper as an example)
* Registry plug-in configuration, take Zookeeper as an example (registry.properties)
dolphinscheduler-service/src/main/resources/registry.properties
```registry.properties
registry.plugin.name=zookeeper
registry.servers=127.0.0.1:2181
```
For specific configuration information, please refer to the parameter information provided by the specific plug-in, for example zk: `org/apache/dolphinscheduler/plugin/registry/zookeeper/ZookeeperConfiguration.java`
All configuration information prefixes need to be +registry, such as base.sleep.time.ms, which should be configured in the registry as follows: registry.base.sleep.time.ms=100
#### How to expand
`dolphinscheduler-registry-api` defines the standard for implementing plugins. When you need to extend plugins, you only need to implement `org.apache.dolphinscheduler.registry.api.RegistryFactory`.
Under the `dolphinscheduler-registry-plugin` module is the registry plugin we currently provide.
#### FAQ
1: registry connect timeout
You can increase the relevant timeout parameters.

View File

@ -0,0 +1,15 @@
## DolphinScheduler Task SPI extension
#### How to develop task plugins?
org.apache.dolphinscheduler.spi.task.TaskChannel
The plug-in can implement the above interface. It mainly includes creating tasks (task initialization, task running, etc.) and task cancellation. If it is a yarn task, you need to implement org.apache.dolphinscheduler.plugin.task.api.AbstractYarnTask.
We provide APIs for external access to all tasks in the dolphinscheduler-task-api module, while the dolphinscheduler-spi module is the spi general code library, which defines all the plug-in modules, such as the alarm module, the registry module, etc., you can read and view in detail .
*NOTICE*
Since the task plug-in involves the front-end page, the front-end SPI has not yet been implemented, so you need to implement the front-end page corresponding to the plug-in separately.
If there is a class conflict in the task plugin, you can use [Shade-Relocating Classes](https://maven.apache.org/plugins/maven-shade-plugin/) to solve this problem.

View File

@ -0,0 +1,159 @@
# DolphinScheduler development
## Software Requests
Before setting up the DolphinScheduler development environment, please make sure you have installed the software as below:
* [Git](https://git-scm.com/downloads): DolphinScheduler version control system
* [JDK](https://www.oracle.com/technetwork/java/javase/downloads/index.html): DolphinScheduler backend language
* [Maven](http://maven.apache.org/download.cgi): Java Package Management System
* [Node](https://nodejs.org/en/download): DolphinScheduler frontend
language
### Clone Git Repository
Download the git repository through your git management tool, here we use git-core as an example
```shell
mkdir dolphinscheduler
cd dolphinscheduler
git clone git@github.com:apache/dolphinscheduler.git
```
### compile source code
i. If you use MySQL database, pay attention to modify pom.xml in the root project, and change the scope of the mysql-connector-java dependency to compile.
ii. Run `mvn clean install -Prelease -Dmaven.test.skip=true`
## Notice
There are two ways to configure the DolphinScheduler development environment, standalone mode and normal mode
* [Standalone mode](#dolphinscheduler-standalone-quick-start): **Recommended**more convenient to build development environment, it can cover most scenes.
* [Normal mode](#dolphinscheduler-normal-mode): Separate server master, worker, api, which can cover more test environments than standalone, and it is more like production environment in real life.
## DolphinScheduler Standalone Quick Start
> **_Note:_** Standalone server only for development and debugging, cause it use H2 Database, Zookeeper Testing Server which may not stable in production
> Standalone is only supported in DolphinScheduler 1.3.9 and later versions
### Git Branch Choose
Use different Git branch to develop different codes
* If you want to develop based on a binary package, switch git branch to specific release branch, for example, if you want to develop base on 1.3.9, you should choose branch `1.3.9-release`.
* If you want to develop the latest code, choose branch branch `dev`.
### Start backend server
Find the class `org.apache.dolphinscheduler.server.StandaloneServer` in Intellij IDEA and clikc run main function to startup.
### Start frontend server
Install frontend dependencies and run it
```shell
cd dolphinscheduler-ui
npm install
npm run start
```
The browser access address http://localhost:12345/dolphinscheduler can login DolphinScheduler UI. The default username and password are **admin/dolphinscheduler123**
## DolphinScheduler Normal Mode
### Prepare
#### zookeeper
Download [ZooKeeper](https://www.apache.org/dyn/closer.lua/zookeeper/zookeeper-3.6.3), and extract it.
* Create directory `zkData` and `zkLog`
* Go to the zookeeper installation directory, copy configure file `zoo_sample.cfg` to `conf/zoo.cfg`, and change value of dataDir in conf/zoo.cfg to dataDir=./tmp/zookeeper
```shell
# We use path /data/zookeeper/data and /data/zookeeper/datalog here as example
dataDir=/data/zookeeper/data
dataLogDir=/data/zookeeper/datalog
```
* Run `./bin/zkServer.sh` in terminal by command `./bin/zkServer.sh start`.
#### Database
The DolphinScheduler's metadata is stored in relational database. Currently supported MySQL and Postgresql. We use MySQL as an example. Start the database and create a new database named dolphinscheduler as DolphinScheduler metabase
After creating the new database, run the sql file under `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql/dolphinscheduler_mysql.sql` directly in MySQL to complete the database initialization
#### Start Backend Server
Following steps will guide how to start the DolphinScheduler backend service
##### Backend Start Prepare
* Open project: Use IDE open the project, here we use Intellij IDEA as an example, after opening it will take a while for Intellij IDEA to complete the dependent download
* Plugin installation(**Only required for 2.0 or later**)
* Registry plug-in configuration, take Zookeeper as an example (registry.properties)
dolphinscheduler-service/src/main/resources/registry.properties
```registry.properties
registry.plugin.name=zookeeper
registry.servers=127.0.0.1:2181
```
* File change
* If you use MySQL as your metadata database, you need to modify `dolphinscheduler/pom.xml` and change the `scope` of the `mysql-connector-java` dependency to `compile`. This step is not necessary to use PostgreSQL
* Modify database configuration, modify the database configuration in the `dolphinscheduler-dao/src/main/resources/application-mysql.yaml`
We here use MySQL with database, username, password named dolphinscheduler as an example
```application-mysql.yaml
spring:
datasource:
driver-class-name: com.mysql.jdbc.Driver
url: jdbc:mysql://127.0.0.1:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
username: ds_user
password: dolphinscheduler
```
* Log level: add a line `<appender-ref ref="STDOUT"/>` to the following configuration to enable the log to be displayed on the command line
`dolphinscheduler-server/src/main/resources/logback-worker.xml`
`dolphinscheduler-server/src/main/resources/logback-master.xml`
`dolphinscheduler-api/src/main/resources/logback-api.xml`
here we add the result after modify as below:
```diff
<root level="INFO">
+ <appender-ref ref="STDOUT"/>
<appender-ref ref="APILOGFILE"/>
<appender-ref ref="SKYWALKING-LOG"/>
</root>
```
> **_Note:_** Only DolphinScheduler 2.0 and later versions need to inatall plugin before start server. It not need before version 2.0.
##### Server start
There are three services that need to be started, including MasterServer, WorkerServer, ApiApplicationServer.
* MasterServerExecute function `main` in the class `org.apache.dolphinscheduler.server.master.MasterServer` by Intellij IDEA, with the configuration *VM Options* `-Dlogging.config=classpath:logback-master.xml -Ddruid.mysql.usePingMethod=false -Dspring.profiles.active=mysql`
* WorkerServerExecute function `main` in the class `org.apache.dolphinscheduler.server.worker.WorkerServer` by Intellij IDEA, with the configuration *VM Options* `-Dlogging.config=classpath:logback-worker.xml -Ddruid.mysql.usePingMethod=false -Dspring.profiles.active=mysql`
* ApiApplicationServerExecute function `main` in the class `org.apache.dolphinscheduler.api.ApiApplicationServer` by Intellij IDEA, with the configuration *VM Options* `-Dlogging.config=classpath:logback-api.xml -Dspring.profiles.active=api,mysql`. After it started, you could find Open API documentation in http://localhost:12345/dolphinscheduler/doc.html
> The `mysql` in the VM Options `-Dspring.profiles.active=mysql` means specified configuration file
### Start Frontend Server
Install frontend dependencies and run it
```shell
cd dolphinscheduler-ui
npm install
npm run start
```
The browser access address http://localhost:12345/dolphinscheduler can login DolphinScheduler UI. The default username and password are **admin/dolphinscheduler123**

View File

@ -0,0 +1,197 @@
# DolphinScheduler E2E Automation Test
## I. Preparatory knowledge
### 1. The difference between E2E Test and Unit Test
E2E, which stands for "End to End", can be translated as "end-to-end" testing. It imitates the user, starting from a certain entry point and progressively performing actions until a certain job is completed. And unit tests are different, the latter usually requires testing parameters, types and parameter values, the number of arguments, the return value, throw an error, and so on, the purpose is to ensure that a specific function to finishing the work is stable and reliable in all cases. Unit testing assumes that if all functions work correctly, then the whole product will work.
In contrast, E2E test does not emphasize so much the need to cover all usage scenarios, it focuses on whether a complete chain of operations can be completed. For the web front-end, it is also concerned with the layout of the interface and whether the content information meets expectations.
For example, E2E test of the login page is concerned with whether the user is able to enter and log in normally, and whether the error message is correctly displayed if the login fails. It is not a major concern whether input that is not legal is processed.
### 2. Selenium test framework
[Selenium](https://www.selenium.dev) is an open source testing tool for executing automated tests on a web browser. The framework uses WebDriver to transform Web Service commands into browser native calls through the browser's native components to complete operations. In simple words, it simulates the browser and makes selection operations on the elements of the page.
A WebDriver is an API and protocol which defines a language-neutral interface for controlling the behavior of a web browser. Every browser has a specific WebDriver implementation, called a driver. The driver is the component responsible for delegating to the browser and handling the communication with Selenium and the browser.
The Selenium framework links all these components together through a user-facing interface that allows transparent work with different browser backends, enabling cross-browser and cross-platform automation.
## II. E2E Test
### 1. E2E-Pages
DolphinScheduler's E2E tests are deployed using docker-compose. The current tests are in standalone mode and are mainly used to check some basic functions such as "add, delete, change and check". For further cluster validation, such as collaboration between services or communication mechanisms between services, refer to `deploy/docker/docker-compose.yml` for configuration.
For E2E test (the front-end part), the [page model](https://www.selenium.dev/documentation/guidelines/page_object_models/) form is used, mainly to create a corresponding model for each page. The following is an example of a login page.
```java
package org.apache.dolphinscheduler.e2e.pages;
import org.apache.dolphinscheduler.e2e.pages.common.NavBarPage;
import org.apache.dolphinscheduler.e2e.pages.security.TenantPage;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.remote.RemoteWebDriver;
import org.openqa.selenium.support.FindBy;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;
import lombok.Getter;
import lombok.SneakyThrows;
@Getter
public final class LoginPage extends NavBarPage {
@FindBy(id = "inputUsername")
private WebElement inputUsername;
@FindBy(id = "inputPassword")
private WebElement inputPassword;
@FindBy(id = "btnLogin")
private WebElement buttonLogin;
public LoginPage(RemoteWebDriver driver) {
super(driver);
}
@SneakyThrows
public TenantPage login(String username, String password) {
inputUsername().sendKeys(username);
inputPassword().sendKeys(password);
buttonLogin().click();
new WebDriverWait(driver, 10)
.until(ExpectedConditions.urlContains("/#/security"));
return new TenantPage(driver);
}
}
```
During the test process, we only test the elements we need to focus on, not all elements of the page. So on the login page only the username, password and login button elements are declared. The FindBy interface is provided by the Selenium test framework to find the corresponding id or class in a Vue file.
In addition, during the testing process, the elements are not manipulated directly. The general choice is to package the corresponding methods to achieve the effect of reuse. For example, if you want to log in, you input your username and password through the `public TenantPage login()` method to manipulate the elements you pass in to achieve the effect of logging in. That is, when the user finishes logging in, he or she jumps to the Security Centre (which goes to the Tenant Management page by default).
The goToTab method is provided in SecurityPage to test the corresponding sidebar jumps, which include TenantPage, UserPage and WorkerGroupPge and QueuePage. These pages are implemented in the same way, to test that the form's input, add and delete buttons return the corresponding pages.
```java
public <T extends SecurityPage.Tab> T goToTab(Class<T> tab) {
if (tab == TenantPage.class) {
WebElement menuTenantManageElement = new WebDriverWait(driver, 60)
.until(ExpectedConditions.elementToBeClickable(menuTenantManage));
((JavascriptExecutor)driver).executeScript("arguments[0].click();", menuTenantManageElement);
return tab.cast(new TenantPage(driver));
}
if (tab == UserPage.class) {
WebElement menUserManageElement = new WebDriverWait(driver, 60)
.until(ExpectedConditions.elementToBeClickable(menUserManage));
((JavascriptExecutor)driver).executeScript("arguments[0].click();", menUserManageElement);
return tab.cast(new UserPage(driver));
}
if (tab == WorkerGroupPage.class) {
WebElement menWorkerGroupManageElement = new WebDriverWait(driver, 60)
.until(ExpectedConditions.elementToBeClickable(menWorkerGroupManage));
((JavascriptExecutor)driver).executeScript("arguments[0].click();", menWorkerGroupManageElement);
return tab.cast(new WorkerGroupPage(driver));
}
if (tab == QueuePage.class) {
menuQueueManage().click();
return tab.cast(new QueuePage(driver));
}
throw new UnsupportedOperationException("Unknown tab: " + tab.getName());
}
```
![SecurityPage](/img/e2e-test/SecurityPage.png)
For navigation bar options jumping, the goToNav method is provided in `org/apache/dolphinscheduler/e2e/pages/common/NavBarPage.java`. The currently supported pages are: ProjectPage, SecurityPage and ResourcePage.
```java
public <T extends NavBarItem> T goToNav(Class<T> nav) {
if (nav == ProjectPage.class) {
WebElement projectTabElement = new WebDriverWait(driver, 60)
.until(ExpectedConditions.elementToBeClickable(projectTab));
((JavascriptExecutor)driver).executeScript("arguments[0].click();", projectTabElement);
return nav.cast(new ProjectPage(driver));
}
if (nav == SecurityPage.class) {
WebElement securityTabElement = new WebDriverWait(driver, 60)
.until(ExpectedConditions.elementToBeClickable(securityTab));
((JavascriptExecutor)driver).executeScript("arguments[0].click();", securityTabElement);
return nav.cast(new SecurityPage(driver));
}
if (nav == ResourcePage.class) {
WebElement resourceTabElement = new WebDriverWait(driver, 60)
.until(ExpectedConditions.elementToBeClickable(resourceTab));
((JavascriptExecutor)driver).executeScript("arguments[0].click();", resourceTabElement);
return nav.cast(new ResourcePage(driver));
}
throw new UnsupportedOperationException("Unknown nav bar");
}
```
### E2E-Cases
Current E2E test cases supported include: File Management, Project Management, Queue Management, Tenant Management, User Management, Worker Group Management and Workflow Test.
![E2E_Cases](/img/e2e-test/E2E_Cases.png)
The following is an example of a tenant management test. As explained earlier, we use docker-compose for deployment, so for each test case, we need to import the corresponding file in the form of an annotation.
The browser is loaded using the RemoteWebDriver provided with Selenium. Before each test case is started there is some preparation work that needs to be done. For example: logging in the user, jumping to the corresponding page (depending on the specific test case).
```java
@BeforeAll
public static void setup() {
new LoginPage(browser)
.login("admin", "dolphinscheduler123")
.goToNav(SecurityPage.class)
.goToTab(TenantPage.class)
;
}
```
When the preparation is complete, it is time for the formal test case writing. We use a form of @Order() annotation for modularity, to confirm the order of the tests. After the tests have been run, assertions are used to determine if the tests were successful, and if the assertion returns true, the tenant creation was successful. The following code can be used as a reference:
```java
@Test
@Order(10)
void testCreateTenant() {
final TenantPage page = new TenantPage(browser);
page.create(tenant);
await().untilAsserted(() -> assertThat(page.tenantList())
.as("Tenant list should contain newly-created tenant")
.extracting(WebElement::getText)
.anyMatch(it -> it.contains(tenant)));
}
```
The rest are similar cases and can be understood by referring to the specific source code.
https://github.com/apache/dolphinscheduler/tree/dev/dolphinscheduler-e2e/dolphinscheduler-e2e-case/src/test/java/org/apache/dolphinscheduler/e2e/cases
## III. Supplements
When running E2E tests locally, First, you need to start the local service, you can refer to this page:
[development-environment-setup](https://dolphinscheduler.apache.org/en-us/development/development-environment-setup.html)
When running E2E tests locally, the `-Dlocal=true` parameter can be configured to connect locally and facilitate changes to the UI.
When running E2E tests with `M1` chip, you can use `-Dm1_chip=true` parameter to configure containers supported by
`ARM64`.
![Dlocal](/img/e2e-test/Dlocal.png)
If a connection timeout occurs during a local run, increase the load time to a recommended 30 and above.
![timeout](/img/e2e-test/timeout.png)
The test run will be available as an MP4 file.
![MP4](/img/e2e-test/MP4.png)

View File

@ -0,0 +1,639 @@
# Front-end development documentation
### Technical selection
```
Vue mvvm framework
Es6 ECMAScript 6.0
Ans-ui Analysys-ui
D3 Visual Library Chart Library
Jsplumb connection plugin library
Lodash high performance JavaScript utility library
```
### Development environment
- #### Node installation
Node package download (note version v12.20.2) `https://nodejs.org/download/release/v12.20.2/`
- #### Front-end project construction
Use the command line mode `cd` enter the `dolphinscheduler-ui` project directory and execute `npm install` to pull the project dependency package.
> If `npm install` is very slow, you can set the taobao mirror
```
npm config set registry http://registry.npm.taobao.org/
```
- Modify `API_BASE` in the file `dolphinscheduler-ui/.env` to interact with the backend:
```
# back end interface address
API_BASE = http://127.0.0.1:12345
```
> ##### ! ! ! Special attention here. If the project reports a "node-sass error" error while pulling the dependency package, execute the following command again after execution.
```bash
npm install node-sass --unsafe-perm #Install node-sass dependency separately
```
- #### Development environment operation
- `npm start` project development environment (after startup address http://localhost:8888)
#### Front-end project release
- `npm run build` project packaging (after packaging, the root directory will create a folder called dist for publishing Nginx online)
Run the `npm run build` command to generate a package file (dist) package
Copy it to the corresponding directory of the server (front-end service static page storage directory)
Visit address` http://localhost:8888`
#### Start with node and daemon under Linux
Install pm2 `npm install -g pm2`
Execute `pm2 start npm -- run dev` to start the project in the project `dolphinscheduler-ui `root directory
#### command
- Start `pm2 start npm -- run dev`
- Stop `pm2 stop npm`
- delete `pm2 delete npm`
- Status `pm2 list`
```
[root@localhost dolphinscheduler-ui]# pm2 start npm -- run dev
[PM2] Applying action restartProcessId on app [npm](ids: 0)
[PM2] [npm](0) ✓
[PM2] Process successfully started
┌──────────┬────┬─────────┬──────┬──────┬────────┬─────────┬────────┬─────┬──────────┬──────┬──────────┐
│ App name │ id │ version │ mode │ pid │ status │ restart │ uptime │ cpu │ mem │ user │ watching │
├──────────┼────┼─────────┼──────┼──────┼────────┼─────────┼────────┼─────┼──────────┼──────┼──────────┤
│ npm │ 0 │ N/A │ fork │ 6168 │ online │ 31 │ 0s │ 0% │ 5.6 MB │ root │ disabled │
└──────────┴────┴─────────┴──────┴──────┴────────┴─────────┴────────┴─────┴──────────┴──────┴──────────┘
Use `pm2 show <id|name>` to get more details about an app
```
### Project directory structure
`build` some webpack configurations for packaging and development environment projects
`node_modules` development environment node dependency package
`src` project required documents
`src => combo` project third-party resource localization `npm run combo` specific view `build/combo.js`
`src => font` Font icon library can be added by visiting https://www.iconfont.cn Note: The font library uses its own secondary development to reintroduce its own library `src/sass/common/_font.scss`
`src => images` public image storage
`src => js` js/vue
`src => lib` internal components of the company (company component library can be deleted after open source)
`src => sass` sass file One page corresponds to a sass file
`src => view` page file One page corresponds to an html file
```
> Projects are developed using vue single page application (SPA)
- All page entry files are in the `src/js/conf/${ corresponding page filename => home} index.js` entry file
- The corresponding sass file is in `src/sass/conf/${corresponding page filename => home}/index.scss`
- The corresponding html file is in `src/view/${corresponding page filename => home}/index.html`
```
Public module and utill `src/js/module`
`components` => internal project common components
`download` => download component
`echarts` => chart component
`filter` => filter and vue pipeline
`i18n` => internationalization
`io` => io request encapsulation based on axios
`mixin` => vue mixin public part for disabled operation
`permissions` => permission operation
`util` => tool
### System function module
Home => `http://localhost:8888/#/home`
Project Management => `http://localhost:8888/#/projects/list`
```
| Project Home
| Workflow
- Workflow definition
- Workflow instance
- Task instance
```
Resource Management => `http://localhost:8888/#/resource/file`
```
| File Management
| udf Management
- Resource Management
- Function management
```
Data Source Management => `http://localhost:8888/#/datasource/list`
Security Center => `http://localhost:8888/#/security/tenant`
```
| Tenant Management
| User Management
| Alarm Group Management
- master
- worker
```
User Center => `http://localhost:8888/#/user/account`
## Routing and state management
The project `src/js/conf/home` is divided into
`pages` => route to page directory
```
The page file corresponding to the routing address
```
`router` => route management
```
vue router, the entry file index.js in each page will be registered. Specific operations: https://router.vuejs.org/zh/
```
`store` => status management
```
The page corresponding to each route has a state management file divided into:
actions => mapActions => Detailshttps://vuex.vuejs.org/zh/guide/actions.html
getters => mapGetters => Detailshttps://vuex.vuejs.org/zh/guide/getters.html
index => entrance
mutations => mapMutations => Detailshttps://vuex.vuejs.org/zh/guide/mutations.html
state => mapState => Detailshttps://vuex.vuejs.org/zh/guide/state.html
Specific actionhttps://vuex.vuejs.org/zh/
```
## specification
## Vue specification
##### 1.Component name
The component is named multiple words and is connected with a wire (-) to avoid conflicts with HTML tags and a clearer structure.
```
// positive example
export default {
name: 'page-article-item'
}
```
##### 2.Component files
The internal common component of the `src/js/module/components` project writes the folder name with the same name as the file name. The subcomponents and util tools that are split inside the common component are placed in the internal `_source` folder of the component.
```
└── components
├── header
├── header.vue
└── _source
└── nav.vue
└── util.js
├── conditions
├── conditions.vue
└── _source
└── search.vue
└── util.js
```
##### 3.Prop
When you define Prop, you should always name it in camel format (camelCase) and use the connection line (-) when assigning values to the parent component.
This follows the characteristics of each language, because it is case-insensitive in HTML tags, and the use of links is more friendly; in JavaScript, the more natural is the hump name.
```
// Vue
props: {
articleStatus: Boolean
}
// HTML
<article-item :article-status="true"></article-item>
```
The definition of Prop should specify its type, defaults, and validation as much as possible.
Example
```
props: {
attrM: Number,
attrA: {
type: String,
required: true
},
attrZ: {
type: Object,
// The default value of the array/object should be returned by a factory function
default: function () {
return {
msg: 'achieve you and me'
}
}
},
attrE: {
type: String,
validator: function (v) {
return !(['success', 'fail'].indexOf(v) === -1)
}
}
}
```
##### 4.v-for
When performing v-for traversal, you should always bring a key value to make rendering more efficient when updating the DOM.
```
<ul>
<li v-for="item in list" :key="item.id">
{{ item.title }}
</li>
</ul>
```
v-for should be avoided on the same element as v-if (`for example: <li>`) because v-for has a higher priority than v-if. To avoid invalid calculations and rendering, you should try to use v-if Put it on top of the container's parent element.
```
<ul v-if="showList">
<li v-for="item in list" :key="item.id">
{{ item.title }}
</li>
</ul>
```
##### 5.v-if / v-else-if / v-else
If the elements in the same set of v-if logic control are logically identical, Vue reuses the same part for more efficient element switching, `such as: value`. In order to avoid the unreasonable effect of multiplexing, you should add key to the same element for identification.
```
<div v-if="hasData" key="mazey-data">
<span>{{ mazeyData }}</span>
</div>
<div v-else key="mazey-none">
<span>no data</span>
</div>
```
##### 6.Instruction abbreviation
In order to unify the specification, the instruction abbreviation is always used. Using `v-bind`, `v-on` is not bad. Here is only a unified specification.
```
<input :value="mazeyUser" @click="verifyUser">
```
##### 7.Top-level element order of single file components
Styles are packaged in a file, all the styles defined in a single vue file, the same name in other files will also take effect. All will have a top class name before creating a component.
Note: The sass plugin has been added to the project, and the sas syntax can be written directly in a single vue file.
For uniformity and ease of reading, they should be placed in the order of `<template>`、`<script>`、`<style>`.
```
<template>
<div class="test-model">
test
</div>
</template>
<script>
export default {
name: "test",
data() {
return {}
},
props: {},
methods: {},
watch: {},
beforeCreate() {
},
created() {
},
beforeMount() {
},
mounted() {
},
beforeUpdate() {
},
updated() {
},
beforeDestroy() {
},
destroyed() {
},
computed: {},
components: {},
}
</script>
<style lang="scss" rel="stylesheet/scss">
.test-model {
}
</style>
```
## JavaScript specification
##### 1.var / let / const
It is recommended to no longer use var, but use let / const, prefer const. The use of any variable must be declared in advance, except that the function defined by function can be placed anywhere.
##### 2.quotes
```
const foo = 'after division'
const bar = `${foo}ront-end engineer`
```
##### 3.function
Anonymous functions use the arrow function uniformly. When multiple parameters/return values are used, the object's structure assignment is used first.
```
function getPersonInfo ({name, sex}) {
// ...
return {name, gender}
}
```
The function name is uniformly named with a camel name. The beginning of the capital letter is a constructor. The lowercase letters start with ordinary functions, and the new operator should not be used to operate ordinary functions.
##### 4.object
```
const foo = {a: 0, b: 1}
const bar = JSON.parse(JSON.stringify(foo))
const foo = {a: 0, b: 1}
const bar = {...foo, c: 2}
const foo = {a: 3}
Object.assign(foo, {b: 4})
const myMap = new Map([])
for (let [key, value] of myMap.entries()) {
// ...
}
```
##### 5.module
Unified management of project modules using import / export.
```
// lib.js
export default {}
// app.js
import app from './lib'
```
Import is placed at the top of the file.
If the module has only one output value, use `export default`otherwise no.
## HTML / CSS
##### 1.Label
Do not write the type attribute when referencing external CSS or JavaScript. The HTML5 default type is the text/css and text/javascript properties, so there is no need to specify them.
```
<link rel="stylesheet" href="//www.test.com/css/test.css">
<script src="//www.test.com/js/test.js"></script>
```
##### 2.Naming
The naming of Class and ID should be semantic, and you can see what you are doing by looking at the name; multiple words are connected by a link.
```
// positive example
.test-header{
font-size: 20px;
}
```
##### 3.Attribute abbreviation
CSS attributes use abbreviations as much as possible to improve the efficiency and ease of understanding of the code.
```
// counter example
border-width: 1px;
border-style: solid;
border-color: #ccc;
// positive example
border: 1px solid #ccc;
```
##### 4.Document type
The HTML5 standard should always be used.
```
<!DOCTYPE html>
```
##### 5.Notes
A block comment should be written to a module file.
```
/**
* @module mazey/api
* @author Mazey <mazey@mazey.net>
* @description test.
* */
```
## interface
##### All interfaces are returned as Promise
Note that non-zero is wrong for catching catch
```
const test = () => {
return new Promise((resolve, reject) => {
resolve({
a:1
})
})
}
// transfer
test.then(res => {
console.log(res)
// {a:1}
})
```
Normal return
```
{
code:0,
data:{}
msg:'success'
}
```
Error return
```
{
code:10000,
data:{}
msg:'failed'
}
```
If the interface is a post request, the Content-Type defaults to application/x-www-form-urlencoded; if the Content-Type is changed to application/json,
Interface parameter transfer needs to be changed to the following way
```
io.post('url', payload, null, null, { emulateJSON: false } res => {
resolve(res)
}).catch(e => {
reject(e)
})
```
##### Related interface path
dag related interface `src/js/conf/home/store/dag/actions.js`
Data Source Center Related Interfaces `src/js/conf/home/store/datasource/actions.js`
Project Management Related Interfaces `src/js/conf/home/store/projects/actions.js`
Resource Center Related Interfaces `src/js/conf/home/store/resource/actions.js`
Security Center Related Interfaces `src/js/conf/home/store/security/actions.js`
User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
## Extended development
##### 1.Add node
(1) First place the icon icon of the node in the `src/js/conf/home/pages/dag/img `folder, and note the English name of the node defined by the `toolbar_${in the background. For example: SHELL}.png`
(2) Find the `tasksType` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
```
'DEPENDENT': { // The background definition node type English name is used as the key value
desc: 'DEPENDENT', // tooltip desc
color: '#2FBFD8' // The color represented is mainly used for tree and gantt
}
```
(3) Add a `${node type (lowercase)}`.vue file in `src/js/conf/home/pages/dag/_source/formModel/tasks`. The contents of the components related to the current node are written here. Must belong to a node component must have a function _verification () After the verification is successful, the relevant data of the current component is thrown to the parent component.
```
/**
* Verification
*/
_verification () {
// datasource subcomponent verification
if (!this.$refs.refDs._verifDatasource()) {
return false
}
// verification function
if (!this.method) {
this.$message.warning(`${i18n.$t('Please enter method')}`)
return false
}
// localParams subcomponent validation
if (!this.$refs.refLocalParams._verifProp()) {
return false
}
// store
this.$emit('on-params', {
type: this.type,
datasource: this.datasource,
method: this.method,
localParams: this.localParams
})
return true
}
```
(4) Common components used inside the node component are under` _source`, and `commcon.js` is used to configure public data.
##### 2.Increase the status type
(1) Find the `tasksState` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
```
'WAITTING_DEPEND': { // 'WAITTING_DEPEND': { //Backend defines state type, frontend is used as key value
id: 11, // front-end definition id is used as a sort
desc: `${i18n.$t('waiting for dependency')}`, // tooltip desc
color: '#5101be', // The color represented is mainly used for tree and gantt
icoUnicode: '&#xe68c;', // font icon
isSpin: false // whether to rotate (requires code judgment)
}
```
##### 3.Add the action bar tool
(1) Find the `toolOper` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
```
{
code: 'pointer', // tool identifier
icon: '&#xe781;', // tool icon
disable: disable, // disable
desc: `${i18n.$t('Drag node and selected item')}` // tooltip desc
}
```
(2) Tool classes are returned as a constructor `src/js/conf/home/pages/dag/_source/plugIn`
`downChart.js` => dag image download processing
`dragZoom.js` => mouse zoom effect processing
`jsPlumbHandle.js` => drag and drop line processing
`util.js` => belongs to the `plugIn` tool class
The operation is handled in the `src/js/conf/home/pages/dag/_source/dag.js` => `toolbarEvent` event.
##### 3.Add a routing page
(1) First add a routing address`src/js/conf/home/router/index.js` in route management
```
routing address{
path: '/test', // routing address
name: 'test', // alias
component: resolve => require(['../pages/test/index'], resolve), // route corresponding component entry file
meta: {
title: `${i18n.$t('test')} - EasyScheduler` // title display
}
},
```
(2)Create a `test` folder in `src/js/conf/home/pages` and create an `index.vue `entry file in the folder.
This will give you direct access to`http://localhost:8888/#/test`
##### 4.Increase the preset mailbox
Find the `src/lib/localData/email.js` startup and timed email address input to automatically pull down the match.
```
export default ["test@analysys.com.cn","test1@analysys.com.cn","test3@analysys.com.cn"]
```
##### 5.Authority management and disabled state processing
The permission gives the userType according to the backUser interface `getUserInfo` interface: `"ADMIN_USER/GENERAL_USER" `permission to control whether the page operation button is `disabled`.
specific operation`src/js/module/permissions/index.js`
disabled processing`src/js/module/mixin/disabledState.js`

View File

@ -0,0 +1,65 @@
# Have Questions?
## StackOverflow
For usage questions, it is recommended you use the StackOverflow tag [apache-dolphinscheduler](https://stackoverflow.com/questions/tagged/apache-dolphinscheduler) as it is an active forum for DolphinScheduler users questions and answers.
Some quick tips when using StackOverflow:
- Prior to asking submitting questions, please:
- Search StackOverflows [apache-dolphinscheduler](https://stackoverflow.com/questions/tagged/apache-dolphinscheduler) tag to see if your question has already been answered
- Please follow the StackOverflow [code of conduct](https://stackoverflow.com/help/how-to-ask)
- Always use the apache-dolphinscheduler tag when asking questions
- Please do not cross-post between [StackOverflow](https://stackoverflow.com/questions/tagged/apache-dolphinscheduler) and [GitHub issues](https://github.com/apache/dolphinscheduler/issues/new/choose)
Question template:
> **Describe the question**
>
> A clear and concise description of what the question is.
>
> **Which version of DolphinScheduler:**
>
> -[1.3.0-preview]
>
> **Additional context**
>
> Add any other context about the problem here.
>
> **Requirement or improvement**
>
> \- Please describe about your requirements or improvement suggestions.
For broad, opinion based, ask for external resources, debug issues, bugs, contributing to the project, and scenarios, it is recommended you use the[ GitHub issues ](https://github.com/apache/dolphinscheduler/issues/new/choose)or dev@dolphinscheduler.apache.org mailing list.
## Mailing Lists
- [dev@dolphinscheduler.apache.org](https://lists.apache.org/list.html?dev@dolphinscheduler.apache.org) is for people who want to contribute code to DolphinScheduler. [(subscribe)](mailto:dev-subscribe@dolphinscheduler.apache.org?subject=(send%20this%20email%20to%20subscribe)) [(unsubscribe)](mailto:dev-unsubscribe@dolphinscheduler.apache.org?subject=(send%20this%20email%20to%20unsubscribe)) [(archives)](http://lists.apache.org/list.html?dev@dolphinscheduler.apache.org)
Some quick tips when using email:
- Prior to asking submitting questions, please:
- Search StackOverflow at [apache-dolphinscheduler](https://stackoverflow.com/questions/tagged/apache-dolphinscheduler) to see if your question has already been answered
- Tagging the subject line of your email will help you get a faster response, e.g. [api-server]: How to get open api interface?
- Tags may help identify a topic by:
- Component: MasterServer,ApiServer,WorkerServer,AlertServer, etc
- Level: Beginner, Intermediate, Advanced
- Scenario: Debug, How-to
- For error logs or long code examples, please use [GitHub gist](https://gist.github.com/) and include only a few lines of the pertinent code / log within the email.
## Chat Rooms
Chat rooms are great for quick questions or discussions on specialized topics.
The following chat rooms are officially part of Apache DolphinScheduler:
The Slack workspace URL: http://asf-dolphinscheduler.slack.com/.
You can join through invitation url: https://s.apache.org/dolphinscheduler-slack.
This chat room is used for questions and discussions related to using DolphinScheduler.

View File

@ -218,7 +218,7 @@ A: 1, in **the process definition list**, click the **Start** button.
## Q : Python task setting Python version
A: 1**for the version after 1.0.3** only need to modify PYTHON_HOME in conf/env/.dolphinscheduler_env.sh
A: 1**for the version after 1.0.3** only need to modify PYTHON_HOME in `bin/env/dolphinscheduler_env.sh`
```
export PYTHON_HOME=/bin/python
@ -544,17 +544,6 @@ A: 1, edit nginx config file /etc/nginx/conf.d/escheduler.conf
---
## Q : Welcome to subscribe the DolphinScheduler development mailing list
A: In the process of using DolphinScheduler, if you have any questions or ideas, suggestions, you can participate in the DolphinScheduler community building through the Apache mailing list. Sending a subscription email is also very simple, the steps are as follows:
1, Send an email to dev-subscribe@dolphinscheduler.apache.org with your own email address, subject and content.
2, Receive confirmation email and reply. After completing step 1, you will receive a confirmation email from dev-help@dolphinscheduler.apache.org (if not received, please confirm whether the email is automatically classified as spam, promotion email, subscription email, etc.) . Then reply directly to the email, or click on the link in the email to reply quickly, the subject and content are arbitrary.
3, Receive a welcome email. After completing the above steps, you will receive a welcome email with the subject WELCOME to dev@dolphinscheduler.apache.org, and you have successfully subscribed to the Apache DolphinScheduler mailing list.
---
## Q : Workflow Dependency
A: 1, It is currently judged according to natural days, at the end of last month: the judgment time is the workflow A start_time/scheduler_time between '2019-05-31 00:00:00' and '2019-05-31 23:59:59'. Last month: It is judged that there is an A instance completed every day from the 1st to the end of the month. Last week: There are completed A instances 7 days last week. The first two days: Judging yesterday and the day before yesterday, there must be a completed A instance for two days.
@ -712,6 +701,12 @@ AThe repair can be completed by executing the following SQL in the database:
update t_ds_version set version='2.0.1';
```
## Can not find python-gateway-server in distribute package
After version 3.0.0-alpha, Python gateway server integrate into API server, and Python gateway service will start when you
start API server. If you want disabled when Python gateway service you could change API server configuration in path
`api-server/conf/application.yaml` and change attribute `python-gateway.enabled : false`.
---
## We will collect more FAQ later

View File

@ -2,7 +2,11 @@
## How to Create Alert Plugins and Alert Groups
In version 2.0.0, users need to create alert instances, and then associate them with alert groups. Alert group can use multiple alert instances and notify them one by one.
In version 2.0.0, users need to create alert instances, and needs to choose an alarm policy when defining an alarm instance, there are three options: send if the task succeeds, send on failure, and send on both success and failure. when the workflow or task is executed, if an alarm is triggered, calling the alarm instance send method needs a logical judgment, which matches the alarm instance with the task status, executes the alarm instance sending logic if it matches, and filters if it does not match. When create alert instances then associate them with alert groups. Alert group can use multiple alert instances.
The alarm module supports the following scenarios:
<img src="/img/alert/alert_scenarios_en.png">
The steps to use are as follows:
First, go to the Security Center page. Select Alarm Group Management, click Alarm Instance Management on the left and create an alarm instance. Select the corresponding alarm plug-in and fill in the relevant alarm parameters.
@ -11,4 +15,4 @@ Then select Alarm Group Management, create an alarm group, and choose the corres
<img src="/img/alert/alert_step_1.png">
<img src="/img/alert/alert_step_2.png">
<img src="/img/alert/alert_step_3.png">
<img src="/img/alert/alert_step_4.png">
<img src="/img/alert/alert_step_4.png">

View File

@ -1,14 +1,71 @@
# Enterprise WeChat
If you need to use `Enterprise WeChat` to alert, create an alert instance in the alert instance management, and choose the WeChat plugin.
If you need to use `Enterprise WeChat` to alert, create an alert instance in the alert instance management, and choose the `WeChat` plugin.
The following is the `WeChat` configuration example:
![enterprise-wechat-plugin](/img/alert/enterprise-wechat-plugin.png)
The parameter `send.type` corresponds to app and group chat respectively:
## Send Type
The parameter `send.type` corresponds to sending messages to Enterprise WeChat customized APP and group chat created by API respectively.
### APP
The APP sends type means to notify the alert results via Enterprise WeChat customized APPs, supports sending messages to both specified users and all members. Currently, send to specified enterprise department and tags are not supported, a new PR to contribute is welcomed.
The following is the `APP` alert config example:
![enterprise-wechat-app-msg-config](/img/alert/wechat-app-form-example.png)
The following is the `APP` `MARKDOWN` alert message example:
![enterprise-wechat-app-msg-markdown](/img/alert/enterprise-wechat-app-msg-md.png)
The following is the `APP` `TEXT` alert message example:
![enterprise-wechat-app-msg-text](/img/alert/enterprise-wechat-app-msg.png)
#### Prerequisites
Need to create a new customized APP in Enterprise WeChat before sending messages to APP, create at the [APP Page](https://work.weixin.qq.com/wework_admin/frame#apps) and acquire the APP `AgentId` and set its visible scope to the root of the hierarchy.
#### Send Messages to Specified Users
The Enterprise WeChat APPs support sending messages to both specified users and all members, using `|` to separate multiple `userIds` and using `@all` to send messages to everyone.
To acquire user `userId` refer to [Official Doc](https://developer.work.weixin.qq.com/document/path/95402), acquire `userId` by user phone number.
The following is the `query userId` API example:
![enterprise-wechat-create-group](/img/alert/enterprise-wechat-query-userid.png)
#### References
APP: https://work.weixin.qq.com/api/doc/90000/90135/90236
Group Chat: https://work.weixin.qq.com/api/doc/90000/90135/90248
The parameter `user.send.msg` corresponds to the `content` in the document, and the corresponding variable is `{msg}`.
### Group Chat
The Group Chat send type means to notify the alert results via group chat created by Enterprise WeChat API, sending messages to all members of the group and specified users are not supported.
The following is the `Group Chat` alert config example:
![enterprise-wechat-app-msg-config](/img/alert/wechat-group-form-example.png)
The following is the `APP` `MARKDOWN` alert message example:
![enterprise-wechat-group-msg-markdown](/img/alert/enterprise-wechat-group-msg-md.png)
The following is the `Group Chat` `TEXT` alert message example:
![enterprise-wechat-group-msg-text](/img/alert/enterprise-wechat-group-msg.png)
#### Prerequisites
Before sending messages to group chat, create a new group chat by Enterprise WeChat API, refer to [Official Doc](https://developer.work.weixin.qq.com/document/path/90245) to create a new one and acquire `chatid`.
To acquire user `userId` refer to [Official Doc](https://developer.work.weixin.qq.com/document/path/95402), acquire `userId` by user phone number.
The following is the `create new group chat` API and `query userId` API example:
![enterprise-wechat-create-group](/img/alert/enterprise-wechat-create-group.png)
![enterprise-wechat-create-group](/img/alert/enterprise-wechat-query-userid.png)
#### References
Group Chathttps://work.weixin.qq.com/api/doc/90000/90135/90248

View File

@ -0,0 +1,310 @@
# Overview
## Introduction
The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
- The execution flow of the data quality task is as follows:
> The user defines the task in the interface, and the user input value is stored in `TaskParam`
When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
Add config : `<server-name>/conf/common.properties`
```properties
data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
```
Please fill in `data-quality.jar.name` according to the actual package name,
If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name`.
If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
If you want to use `MySQL` data, you need to comment out the `scope` of `MySQL` in `pom.xml`
Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet
`Spark` needs to be configured to read `Hive` metadata, `Spark` does not use `jdbc` to read `Hive`
## Detail
- CheckMethod: [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed.
- CheckFormula
- Expected-Actual
- Actual-Expected
- (Actual/Expected)x100%
- (Expected-Actual)/Expected x100%
- Operator=、>、>=、<<=、!=
- ExpectedValue
- FixValue
- DailyAvg
- WeeklyAvg
- MonthlyAvg
- Last7DayAvg
- Last30DayAvg
- SrcTableTotalRows
- TargetTableTotalRows
- example
- CheckFormulaExpected-Actual
- Operator>
- Threshold0
- ExpectedValueFixValue=9。
Assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail
# Guide
## NullCheck
### Introduction
The goal of the null value check is to check the number of empty rows in the specified column. The number of empty rows can be compared with the total number of rows or a specified threshold. If it is greater than a certain threshold, it will be judged as failure.
- Calculate the SQL statement that the specified column is empty as follows:
```sql
SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
```
- The SQL to calculate the total number of rows in the table is as follows:
```sql
SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
```
### UI Guide
![dataquality_null_check](/img/tasks/demo/null_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type
- Source data table: drop-down to select the table where the validation data is located
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
- Src table check column: drop-down to select the check column name
- Check method:
- [Expected-Actual]
- [Actual-Expected]
- [Actual/Expected]x100%
- [(Expected-Actual)/Expected]x100%
- Check operators: =, >, >=, <, <=, ! =
- Threshold: The value used in the formula for comparison
- Failure strategy
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
- Expected value type: select the desired type from the drop-down menu
## Timeliness Check
### Introduction
The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail
### UI Guide
![dataquality_timeliness_check](/img/tasks/demo/timeliness_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type
- Source data table: drop-down to select the table where the validation data is located
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
- Src table check column: drop-down to select check column name
- start time: the start time of a time range
- end time: the end time of a time range
- Time Format: Set the corresponding time format
- Check method:
- [Expected-Actual]
- [Actual-Expected]
- [Actual/Expected]x100%
- [(Expected-Actual)/Expected]x100%
- Check operators: =, >, >=, <, <=, ! =
- Threshold: The value used in the formula for comparison
- Failure strategy
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
- Expected value type: select the desired type from the drop-down menu
## Field Length Check
### Introduction
The goal of field length verification is to check whether the length of the selected field meets the expectations. If there is data that does not meet the requirements, and the number of rows exceeds the threshold, the task will be judged to fail
### UI Guide
![dataquality_length_check](/img/tasks/demo/field_length_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type
- Source data table: drop-down to select the table where the validation data is located
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
- Src table check column: drop-down to select the check column name
- Logical operators: =, >, >=, <, <=, ! =
- Field length limit: like the title
- Check method:
- [Expected-Actual]
- [Actual-Expected]
- [Actual/Expected]x100%
- [(Expected-Actual)/Expected]x100%
- Check operators: =, >, >=, <, <=, ! =
- Threshold: The value used in the formula for comparison
- Failure strategy
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
- Expected value type: select the desired type from the drop-down menu
## Uniqueness Check
### Introduction
The goal of the uniqueness check is to check whether the field is duplicated. It is generally used to check whether the primary key is duplicated. If there is duplication and the threshold is reached, the check task will be judged to be failed.
### UI Guide
![dataquality_uniqueness_check](/img/tasks/demo/uniqueness_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type
- Source data table: drop-down to select the table where the validation data is located
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
- Src table check column: drop-down to select the check column name
- Check method:
- [Expected-Actual]
- [Actual-Expected]
- [Actual/Expected]x100%
- [(Expected-Actual)/Expected]x100%
- Check operators: =, >, >=, <, <=, ! =
- Threshold: The value used in the formula for comparison
- Failure strategy
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
- Expected value type: select the desired type from the drop-down menu
## Regular Expression Check
### Introduction
The goal of regular expression verification is to check whether the format of the value of a field meets the requirements, such as time format, email format, ID card format, etc. If there is data that does not meet the format and exceeds the threshold, the task will be judged as failed.
### UI Guide
![dataquality_regex_check](/img/tasks/demo/regexp_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type
- Source data table: drop-down to select the table where the validation data is located
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
- Src table check column: drop-down to select check column name
- Regular expression: as title
- Check method:
- [Expected-Actual]
- [Actual-Expected]
- [Actual/Expected]x100%
- [(Expected-Actual)/Expected]x100%
- Check operators: =, >, >=, <, <=, ! =
- Threshold: The value used in the formula for comparison
- Failure strategy
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
- Expected value type: select the desired type from the drop-down menu
## Enumeration Check
### Introduction
The goal of enumeration value verification is to check whether the value of a field is within the range of enumeration values. If there is data that is not in the range of enumeration values and exceeds the threshold, the task will be judged to fail
### UI Guide
![dataquality_enum_check](/img/tasks/demo/enumeration_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type
- Source data table: drop-down to select the table where the validation data is located
- Src table filter conditions: such as title, also used when counting the total number of rows in the table, optional
- Src table check column: drop-down to select the check column name
- List of enumeration values: separated by commas
- Check method:
- [Expected-Actual]
- [Actual-Expected]
- [Actual/Expected]x100%
- [(Expected-Actual)/Expected]x100%
- Check operators: =, >, >=, <, <=, ! =
- Threshold: The value used in the formula for comparison
- Failure strategy
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
- Expected value type: select the desired type from the drop-down menu
## Table Count Check
### Introduction
The goal of table row number verification is to check whether the number of rows in the table reaches the expected value. If the number of rows does not meet the standard, the task will be judged as failed.
### UI Guide
![dataquality_count_check](/img/tasks/demo/table_count_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type
- Source data table: drop-down to select the table where the validation data is located
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
- Src table check column: drop-down to select the check column name
- Check method:
- [Expected-Actual]
- [Actual-Expected]
- [Actual/Expected]x100%
- [(Expected-Actual)/Expected]x100%
- Check operators: =, >, >=, <, <=, ! =
- Threshold: The value used in the formula for comparison
- Failure strategy
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
- Expected value type: select the desired type from the drop-down menu
## Custom SQL Check
### Introduction
### UI Guide
![dataquality_custom_sql_check](/img/tasks/demo/custom_sql_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type
- Source data table: drop-down to select the table where the data to be verified is located
- Actual value name: alias in SQL for statistical value calculation, such as max_num
- Actual value calculation SQL: SQL for outputting actual values,
- Note: The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.
- select max(a) as max_num from ${src_table}, the table name must be filled like this
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
- Check method:
- Check operators: =, >, >=, <, <=, ! =
- Threshold: The value used in the formula for comparison
- Failure strategy
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
- Expected value type: select the desired type from the drop-down menu
## Accuracy check of multi-table
### Introduction
Accuracy checks are performed by comparing the accuracy differences of data records for selected fields between two tables, examples are as follows
- table test1
| c1 | c2 |
| :---: | :---: |
| a | 1 |
|b|2|
- table test2
| c21 | c22 |
| :---: | :---: |
| a | 1 |
|b|3|
If you compare the data in c1 and c21, the tables test1 and test2 are exactly the same. If you compare c2 and c22, the data in table test1 and table test2 are inconsistent.
### UI Guide
![dataquality_multi_table_accuracy_check](/img/tasks/demo/multi_table_accuracy_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type
- Source data table: drop-down to select the table where the data to be verified is located
- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
- Target data type: choose MySQL, PostgreSQL, etc.
- Target data source: the corresponding data source under the source data type
- Target data table: drop-down to select the table where the data to be verified is located
- Target filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
- Check column:
- Fill in the source data column, operator and target data column respectively
- Verification method: select the desired verification method
- Operators: =, >, >=, <, <=, ! =
- Failure strategy
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
- Expected value type: select the desired type in the drop-down menu, only SrcTableTotalRow, TargetTableTotalRow and fixed value are suitable for selection here
## Comparison of the values checked by the two tables
### Introduction
Two-table value comparison allows users to customize different SQL statistics for two tables and compare the corresponding values. For example, for the source table A, the total amount of a certain column is calculated, and for the target table, the total amount of a certain column is calculated. value sum2, compare sum1 and sum2 to determine the check result
### UI Guide
![dataquality_multi_table_comparison_check](/img/tasks/demo/multi_table_comparison_check.png)
- Source data type: select MySQL, PostgreSQL, etc.
- Source data source: the corresponding data source under the source data type
- Source data table: the table where the data is to be verified
- Actual value name: Calculate the alias in SQL for the actual value, such as max_age1
- Actual value calculation SQL: SQL for outputting actual values,
- Note: The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.
- select max(age) as max_age1 from ${src_table} The table name must be filled like this
- Target data type: choose MySQL, PostgreSQL, etc.
- Target data source: the corresponding data source under the source data type
- Target data table: the table where the data is to be verified
- Expected value name: Calculate the alias in SQL for the expected value, such as max_age2
- Expected value calculation SQL: SQL for outputting expected value,
- Note: The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.
- select max(age) as max_age2 from ${target_table} The table name must be filled like this
- Verification method: select the desired verification method
- Operators: =, >, >=, <, <=, ! =
- Failure strategy
- Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
- Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent
## Task result view
![dataquality_result](/img/tasks/demo/result.png)
## Rule View
### List of rules
![dataquality_rule_list](/img/tasks/demo/rule_list.png)
### Rules Details
![dataquality_rule_detail](/img/tasks/demo/rule_detail.png)

View File

@ -31,9 +31,9 @@ This article describes how to add a new master service or worker service to an e
mkdir -p /opt
cd /opt
# decompress
tar -zxvf apache-dolphinscheduler-1.3.8-bin.tar.gz -C /opt
tar -zxvf apache-dolphinscheduler-<version>-bin.tar.gz -C /opt
cd /opt
mv apache-dolphinscheduler-1.3.8-bin dolphinscheduler
mv apache-dolphinscheduler-<version>-bin dolphinscheduler
```
```markdown
@ -73,10 +73,10 @@ sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
datasource.properties: database connection information
zookeeper.properties: information for connecting zk
common.properties: Configuration information about the resource store (if hadoop is set up, please check if the core-site.xml and hdfs-site.xml configuration files exist).
env/dolphinscheduler_env.sh: environment Variables
dolphinscheduler_env.sh: environment Variables
````
- Modify the `dolphinscheduler_env.sh` environment variable in the `conf/env` directory according to the machine configuration (the following is the example that all the used software install under `/opt/soft`)
- Modify the `dolphinscheduler_env.sh` environment variable in the `bin/env/dolphinscheduler_env.sh` directory according to the machine configuration (the following is the example that all the used software install under `/opt/soft`)
```shell
export HADOOP_HOME=/opt/soft/hadoop

View File

@ -0,0 +1,30 @@
# General Setting
## Language
DolphinScheduler supports two types of built-in language which include `English` and `Chinese`. You could click the button
on the top control bar named `English` and `Chinese` and change it to another one when you want to switch the language.
The entire DolphinScheduler page language will shift when you switch the language selection.
## Theme
DolphinScheduler supports two types of built-in theme which include `Dark` and `Light`. When you want to change the theme
of DolphinScheduler, all you have to do is click the button named `Dark`(or `Light`) on the top control bar and on the left
of to [language](#language) control button.
## Time Zone
DolphinScheduler support time zone setting.
Server Time Zone
The default time zone is UTC when using `bin/dolphinshceduler_daemon.sh` to start the server, you could update `SPRING_JACKSON_TIME_ZONE` in `bin/env/dolphinscheduler_env.sh`, such as `export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-Asia/Shanghai}`.<br>
If you start server in IDEA, the default time zone is your local time zone, you could add the JVM parameter to update server time zone, such as `-Duser.timezone=UTC`. Time zone list refer to [List of tz database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones)
User Time zone
The user's default time zone is based on the time zone which you run the DolphinScheduler service.You could
click the button on the right of the [language](#language) button and then click `Choose timeZone` to choose the time zone
you want to switch. All time related components will adjust their time zone according to the time zone setting you select.

View File

@ -6,7 +6,7 @@ If you are a new hand and want to experience DolphinScheduler functions, we reco
## Deployment Steps
Cluster deployment uses the same scripts and configuration files as [pseudo-cluster deployment](pseudo-cluster.md), so the preparation and deployment steps are the same as pseudo-cluster deployment. The difference is that [pseudo-cluster deployment](pseudo-cluster.md) is for one machine, while cluster deployment (Cluster) is for multiple machines. And steps of "Modify Configuration" are quite different between pseudo-cluster deployment and cluster deployment.
Cluster deployment uses the same scripts and configuration files as [pseudo-cluster deployment](pseudo-cluster.md), so the preparation and deployment steps are the same as pseudo-cluster deployment. The difference is that pseudo-cluster deployment is for one machine, while cluster deployment (Cluster) is for multiple machines. And steps of "Modify Configuration" are quite different between pseudo-cluster deployment and cluster deployment.
### Prerequisites and DolphinScheduler Startup Environment Preparations
@ -32,8 +32,8 @@ apiServers="ds5"
## Start and Login DolphinScheduler
Same as pseudo-cluster.md](pseudo-cluster.md)
Same as [pseudo-cluster](pseudo-cluster.md)
## Start and Stop Server
Same as pseudo-cluster.md](pseudo-cluster.md)
Same as [pseudo-cluster](pseudo-cluster.md)

View File

@ -12,16 +12,16 @@ If you are a new hand and want to experience DolphinScheduler functions, we reco
## Install DolphinScheduler
Please download the source code package `apache-dolphinscheduler-1.3.8-src.tar.gz`, download address: [download address](/en-us/download/download.html)
Please download the source code package `apache-dolphinscheduler-<version>-src.tar.gz`, download address: [download address](/en-us/download/download.html)
To publish the release name `dolphinscheduler` version, please execute the following commands:
```
$ tar -zxvf apache-dolphinscheduler-1.3.8-src.tar.gz
$ cd apache-dolphinscheduler-1.3.8-src/docker/kubernetes/dolphinscheduler
$ tar -zxvf apache-dolphinscheduler-<version>-src.tar.gz
$ cd apache-dolphinscheduler-<version>-src/docker/kubernetes/dolphinscheduler
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm dependency update .
$ helm install dolphinscheduler . --set image.tag=1.3.8
$ helm install dolphinscheduler . --set image.tag=<version>
```
To publish the release name `dolphinscheduler` version to `test` namespace:
@ -193,7 +193,7 @@ kubectl scale --replicas=6 sts dolphinscheduler-worker -n test # with test names
2. Create a new `Dockerfile` to add MySQL driver:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
```
@ -236,7 +236,7 @@ externalDatabase:
2. Create a new `Dockerfile` to add MySQL driver:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
```
@ -265,7 +265,7 @@ docker build -t apache/dolphinscheduler:mysql-driver .
2. Create a new `Dockerfile` to add Oracle driver:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY ojdbc8-19.9.0.0.jar /opt/dolphinscheduler/lib
```
@ -288,7 +288,7 @@ docker build -t apache/dolphinscheduler:oracle-driver .
1. Create a new `Dockerfile` to install pip:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY requirements.txt /tmp
RUN apt-get update && \
apt-get install -y --no-install-recommends python-pip && \
@ -321,7 +321,7 @@ docker build -t apache/dolphinscheduler:pip .
1. Create a new `Dockerfile` to install Python 3:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
RUN apt-get update && \
apt-get install -y --no-install-recommends python3 && \
rm -rf /var/lib/apt/lists/*

View File

@ -87,7 +87,13 @@ sh script/create-dolphinscheduler.sh
## Modify Configuration
After completing the preparation of the basic environment, you need to modify the configuration file according to your environment. The configuration file is in the path of `conf/config/install_config.conf`. Generally, you just need to modify the **INSTALL MACHINE, DolphinScheduler ENV, Database, Registry Server** part to complete the deployment, the following describes the parameters that must be modified:
After completing the preparation of the basic environment, you need to modify the configuration file according to the
environment you used. The configuration files are both in directory `bin/env` and named `install_env.sh` and `dolphinscheduler_env.sh`.
### Modify `install_env.sh`
File `install_env.sh` describes which machines will be installed DolphinScheduler and what server will be installed on
each machine. You could find this file in the path `bin/env/install_env.sh` and the detail of the configuration as below.
```shell
# ---------------------------------------------------------
@ -105,51 +111,73 @@ installPath="~/dolphinscheduler"
# Deploy user, use the user you create in section **Configure machine SSH password-free login**
deployUser="dolphinscheduler"
```
# ---------------------------------------------------------
# DolphinScheduler ENV
# ---------------------------------------------------------
# The path of JAVA_HOME, which JDK install path in section **Preparation**
javaHome="/your/java/home/here"
### Modify `dolphinscheduler_env.sh`
# ---------------------------------------------------------
# Database
# ---------------------------------------------------------
# Database type, username, password, IP, port, metadata. For now `dbtype` supports `mysql` and `postgresql`
dbtype="mysql"
dbhost="localhost:3306"
# Need to modify if you are not using `dolphinscheduler/dolphinscheduler` as your username and password
username="dolphinscheduler"
password="dolphinscheduler"
dbname="dolphinscheduler"
File `dolphinscheduler_env.sh` describes the database configuration of DolphinScheduler, which in the path `bin/env/dolphinscheduler_env.sh`
and some tasks which need external dependencies or libraries such as `JAVA_HOME` and `SPARK_HOME`. You could ignore the
task external dependencies if you do not use those tasks, but you have to change `JAVA_HOME`, registry center and database
related configurations based on your environment.
# ---------------------------------------------------------
# Registry Server
# ---------------------------------------------------------
# Registration center address, the address of ZooKeeper service
registryServers="localhost:2181"
```sh
# JAVA_HOME, will use it to start DolphinScheduler server
export JAVA_HOME=${JAVA_HOME:-/custom/path}
# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-postgresql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_DRIVER_CLASS_NAME=org.postgresql.Driver
export SPRING_DATASOURCE_URL="jdbc:postgresql://127.0.0.1:5432/dolphinscheduler"
export SPRING_DATASOURCE_USERNAME="username"
export SPRING_DATASOURCE_PASSWORD="password"
# Registry center configuration, determines the type and link of the registry center
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-localhost:2181}
```
## Initialize the Database
DolphinScheduler metadata is stored in the relational database. Currently, supports PostgreSQL and MySQL. If you use MySQL, you need to manually download [mysql-connector-java driver][mysql] (8.0.16) and move it to the lib directory of DolphinScheduler. Let's take MySQL as an example for how to initialize the database:
DolphinScheduler metadata is stored in the relational database. Currently, supports PostgreSQL and MySQL. If you use MySQL, you need to manually download [mysql-connector-java driver][mysql] (8.0.16) and move it to the lib directory of DolphinScheduler, which is `tools/libs/`. Let's take MySQL as an example for how to initialize the database:
For mysql 5.6 / 5.7
```shell
mysql -uroot -p
mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
# Change {user} and {password} by requests
# Replace {user} and {password} with your username and password
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost' IDENTIFIED BY '{password}';
mysql> flush privileges;
```
For mysql 8:
```shell
mysql -uroot -p
mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
# Replace {user} and {password} with your username and password
mysql> CREATE USER '{user}'@'%' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%';
mysql> CREATE USER '{user}'@'localhost' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost';
mysql> FLUSH PRIVILEGES;
```
Change the username and password in `tools/conf/application.yaml` to {user} and {password} you set in the previous step.
Then, modify `tools/bin/dolphinscheduler_env.sh`, set mysql as default database `export DATABASE=${DATABASE:-mysql}`.
After the above steps done you would create a new database for DolphinScheduler, then run Shell scripts to init database:
```shell
sh script/create-dolphinscheduler.sh
sh tools/bin/create-schema.sh
```
## Start DolphinScheduler
@ -157,7 +185,7 @@ sh script/create-dolphinscheduler.sh
Use **deployment user** you created above, running the following command to complete the deployment, and the server log will be stored in the logs folder.
```shell
sh install.sh
sh ./bin/install.sh
```
> **_Note:_** For the first time deployment, there maybe occur five times of `sh: bin/dolphinscheduler-daemon.sh: No such file or directory` in the terminal,
@ -193,7 +221,14 @@ sh ./bin/dolphinscheduler-daemon.sh start alert-server
sh ./bin/dolphinscheduler-daemon.sh stop alert-server
```
> **_Note:_**: Please refer to the section of "System Architecture Design" for service usage
> **_Note1:_**: Each server have `dolphinscheduler_env.sh` file in path `<server-name>/conf/dolphinscheduler_env.sh` which
> for micro-services need. It means that you could start all servers by command `<server-name>/bin/start.sh` with different
> environment variable from `bin/env/dolphinscheduler_env.sh`. But it will use file `bin/env/dolphinscheduler_env.sh` overwrite
> `<server-name>/conf/dolphinscheduler_env.sh` if you start server with command `/bin/dolphinscheduler-daemon.sh start <server-name>`.
> **_Note2:_**: Please refer to the section of "System Architecture Design" for service usage. Python gateway service is
> started along with the api-server, and if you do not want to start Python gateway service please disabled it by changing
> the yaml config `python-gateway.enabled : false` in api-server's configuration path `api-server/conf/application.yaml`
[jdk]: https://www.oracle.com/technetwork/java/javase/downloads/index.html
[zookeeper]: https://zookeeper.apache.org/releases.html

View File

@ -1,74 +0,0 @@
SkyWalking Agent Deployment
=============================
The `dolphinscheduler-skywalking` module provides [SkyWalking](https://skywalking.apache.org/) monitor agent for the DolphinScheduler project.
This document describes how to enable SkyWalking version 8.4+ support with this module (recommend using SkyWalking 8.5.0).
## Installation
The following configuration is used to enable the SkyWalking agent.
### Through Environment Variable Configuration (for Docker Compose)
Modify SkyWalking environment variables in `docker/docker-swarm/config.env.sh`:
```
SKYWALKING_ENABLE=true
SW_AGENT_COLLECTOR_BACKEND_SERVICES=127.0.0.1:11800
SW_GRPC_LOG_SERVER_HOST=127.0.0.1
SW_GRPC_LOG_SERVER_PORT=11800
```
And run:
```shell
$ docker-compose up -d
```
### Through Environment Variable Configuration (for Docker)
```shell
$ docker run -d --name dolphinscheduler \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
-e SKYWALKING_ENABLE="true" \
-e SW_AGENT_COLLECTOR_BACKEND_SERVICES="your.skywalking-oap-server.com:11800" \
-e SW_GRPC_LOG_SERVER_HOST="your.skywalking-log-reporter.com" \
-e SW_GRPC_LOG_SERVER_PORT="11800" \
-p 12345:12345 \
apache/dolphinscheduler:1.3.8 all
```
### Through install_config.conf Configuration (for DolphinScheduler install.sh)
Add the following configurations to `${workDir}/conf/config/install_config.conf`.
```properties
# SkyWalking config
# note: enable SkyWalking tracking plugin
enableSkywalking="true"
# note: configure SkyWalking backend service address
skywalkingServers="your.skywalking-oap-server.com:11800"
# note: configure SkyWalking log reporter host
skywalkingLogReporterHost="your.skywalking-log-reporter.com"
# note: configure SkyWalking log reporter port
skywalkingLogReporterPort="11800"
```
## Usage
### Import Dashboard
#### Import DolphinScheduler Dashboard to SkyWalking Server
Copy the `${dolphinscheduler.home}/ext/skywalking-agent/dashboard/dolphinscheduler.yml` file into `${skywalking-oap-server.home}/config/ui-initialized-templates/` directory, and restart SkyWalking oap-server.
#### View DolphinScheduler Dashboard
If you have opened the SkyWalking dashboard with a browser before, you need to clear the browser cache.
![img1](/img/skywalking/import-dashboard-1.jpg)

View File

@ -39,4 +39,8 @@ sh ./bin/dolphinscheduler-daemon.sh start standalone-server
sh ./bin/dolphinscheduler-daemon.sh stop standalone-server
```
> Note: Python gateway service is started along with the api-server, and if you do not want to start Python gateway
> service please disabled it by changing the yaml config `python-gateway.enabled : false` in api-server's configuration
> path `api-server/conf/application.yaml`
[jdk]: https://www.oracle.com/technetwork/java/javase/downloads/index.html

View File

@ -4,19 +4,19 @@
- Service management is mainly to monitor and display the health status and basic information of each service in the system.
## Monitor Master Server
### Master Server
- Mainly related to master information.
![master](/img/new_ui/dev/monitor/master.png)
## Monitor Worker Server
### Worker Server
- Mainly related to worker information.
![worker](/img/new_ui/dev/monitor/worker.png)
## Monitor DB
### Database
- Mainly the health status of the DB.
@ -24,9 +24,18 @@
## Statistics Management
### Statistics
![statistics](/img/new_ui/dev/monitor/statistics.png)
- Number of commands wait to be executed: statistics of the `t_ds_command` table data.
- The number of failed commands: statistics of the `t_ds_error_command` table data.
- Number of tasks wait to run: count the data of `task_queue` in the ZooKeeper.
- Number of tasks wait to be killed: count the data of `task_kill` in the ZooKeeper.
### Audit Log
The audit log provides information about who accesses the system and the operations made to the system and record related
time, which strengthen the security of the system and maintenance.
![audit-log](/img/new_ui/dev/monitor/audit-log.jpg)

View File

@ -0,0 +1,13 @@
# Task Definition
Task definition allows to modify or operate tasks at the task level rather than modifying them in the workflow definition.
We already have workflow level task editor in [workflow definition](workflow-definition.md) which you can click the specific
workflow and then edit its task definition. It is depressing when you want to edit the task definition but do not remember
which workflow it belongs to. So we decide to add `Task Definition` view under `Task` menu.
![task-definition](/img/new_ui/dev/project/task-definition.jpg)
In this view, you can create, query, update, delete task definition by click the related button in `operation` column. The
most exciting thing is you could query task by task name in the wildcard, and it is useful when you only remember the task
name but forget which workflow it belongs to. It is also supported query by the task name alone with `Task Type` or
`Workflow Name`

View File

@ -39,13 +39,13 @@ Please download the source package apache-dolphinscheduler-x.x.x-src.tar.gz from
> For Windows Docker Desktop user, open **Windows PowerShell**
```
$ tar -zxvf apache-dolphinscheduler-1.3.8-src.tar.gz
$ cd apache-dolphinscheduler-1.3.8-src/docker/docker-swarm
$ docker pull dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
$ docker tag apache/dolphinscheduler:1.3.8 apache/dolphinscheduler:latest
$ tar -zxvf apache-dolphinscheduler-<version>-src.tar.gz
$ cd apache-dolphinscheduler-<version>-src/deploy/docker
$ docker pull dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
$ docker tag apache/dolphinscheduler:<version> apache/dolphinscheduler:latest
$ docker-compose up -d
```
> PowerShell should use `cd apache-dolphinscheduler-1.3.8-src\docker\docker-swarm`
> PowerShell should use `cd apache-dolphinscheduler-<version>-src\deploy\docker`
**PostgreSQL** (user `root`, password `root`, database `dolphinscheduler`) and **ZooKeeper** services will be started by default
@ -78,7 +78,7 @@ This method requires the installation of [docker](https://docs.docker.com/engine
We have uploaded the DolphinScheduler images for users to the docker repository. Instead of building the image locally, users can pull the image from the docker repository by running the following command.
```
docker pull dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
docker pull dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
```
#### 5. Run a DolphinScheduler instance
@ -89,7 +89,7 @@ $ docker run -d --name dolphinscheduler \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
-p 12345:12345 \
apache/dolphinscheduler:1.3.8 all
apache/dolphinscheduler:<version> all
```
Note: The database user test and password test need to be replaced with the actual PostgreSQL user and password. 192.168.x.x needs to be replaced with the host IP of PostgreSQL and ZooKeeper.
@ -118,7 +118,7 @@ $ docker run -d --name dolphinscheduler-master \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
apache/dolphinscheduler:1.3.8 master-server
apache/dolphinscheduler:<version> master-server
```
* Start a **worker server**, as follows:
@ -128,7 +128,7 @@ $ docker run -d --name dolphinscheduler-worker \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
apache/dolphinscheduler:1.3.8 worker-server
apache/dolphinscheduler:<version> worker-server
```
* Start a **api server**, as follows:
@ -139,7 +139,7 @@ $ docker run -d --name dolphinscheduler-api \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
-p 12345:12345 \
apache/dolphinscheduler:1.3.8 api-server
apache/dolphinscheduler:<version> api-server
```
* Start a **alter server**, as follows:
@ -148,7 +148,7 @@ apache/dolphinscheduler:1.3.8 api-server
$ docker run -d --name dolphinscheduler-alert \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
apache/dolphinscheduler:1.3.8 alert-server
apache/dolphinscheduler:<version> alert-server
```
**NOTE**: When you run some of the services in dolphinscheduler, you must specify these environment variables `DATABASE_HOST`, `DATABASE_PORT`, `DATABASE_DATABASE`, `DATABASE_USERNAME`, `DATABASE_ PASSWORD`, `ZOOKEEPER_QUORUM`.
@ -225,15 +225,15 @@ Lists all running containers:
```
docker ps
docker ps --format "{{.Names}}" # 只打印名字
docker ps --format "{{.Names}}" # Show container name only
```
View the logs of the container named docker-swarm_dolphinscheduler-api_1:
```
docker logs docker-swarm_dolphinscheduler-api_1
docker logs -f docker-swarm_dolphinscheduler-api_1 # 跟随日志输出
docker logs --tail 10 docker-swarm_dolphinscheduler-api_1 # 显示倒数10行日志
docker logs -f docker-swarm_dolphinscheduler-api_1 # Follow the latest logs
docker logs --tail 10 docker-swarm_dolphinscheduler-api_1 # Follow the latest ten lines of logs
```
### How to scale master and worker with docker-compose?
@ -312,14 +312,14 @@ If you don't understand `. /docker/build/hooks/build` `. /docker/build/hooks/bui
#### Build from binary packages (Maven 3.3+ & JDK 1.8+ not required)
Please download the binary package apache-dolphinscheduler-1.3.8-bin.tar.gz from: [download](/zh-cn/download/download.html). Then put apache-dolphinscheduler-1.3.8-bin.tar.gz into the `apache-dolphinscheduler-1.3.8-src/docker/build` directory and execute it in Terminal or PowerShell:
Please download the binary package apache-dolphinscheduler-<version>-bin.tar.gz from: [download](/zh-cn/download/download.html). Then put apache-dolphinscheduler-<version>-bin.tar.gz into the `apache-dolphinscheduler-<version>-src/docker/build` directory and execute it in Terminal or PowerShell:
```
$ cd apache-dolphinscheduler-1.3.8-src/docker/build
$ docker build --build-arg VERSION=1.3.8 -t apache/dolphinscheduler:1.3.8 .
$ cd apache-dolphinscheduler-<version>-src/docker/build
$ docker build --build-arg VERSION=<version> -t apache/dolphinscheduler:<version> .
```
> PowerShell should use `cd apache-dolphinscheduler-1.3.8-src/docker/build`
> PowerShell should use `cd apache-dolphinscheduler-<version>-src/docker/build`
#### Building images for multi-platform architectures
@ -374,7 +374,7 @@ done
2. Create a new `Dockerfile` to add the MySQL driver package:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
```
@ -420,7 +420,7 @@ DATABASE_PARAMS=useUnicode=true&characterEncoding=UTF-8
2. Create a new `Dockerfile` to add the MySQL driver package:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
```
@ -449,7 +449,7 @@ docker build -t apache/dolphinscheduler:mysql-driver .
2. Create a new `Dockerfile` to add the Oracle driver package:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY ojdbc8-19.9.0.0.jar /opt/dolphinscheduler/lib
```
@ -472,7 +472,7 @@ docker build -t apache/dolphinscheduler:oracle-driver .
1. Create a new `Dockerfile` for installing pip:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY requirements.txt /tmp
RUN apt-get update && \
apt-get install -y --no-install-recommends python-pip && \
@ -506,7 +506,7 @@ docker build -t apache/dolphinscheduler:pip .
1. Create a new `Dockerfile` for installing Python 3:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
RUN apt-get update && \
apt-get install -y --no-install-recommends python3 && \
rm -rf /var/lib/apt/lists/*
@ -977,7 +977,7 @@ Configure the mail service port for `alert-server`, default value `empty`.
Configure the mail sender for `alert-server`, default value `empty`.
**`MAIL_USER=`**
**`MAIL_USER`**
Configure the user name of the mail service for `alert-server`, default value `empty`.

View File

@ -40,7 +40,7 @@ This example demonstrates how to import data from Hive into MySQL.
### Configure the DataX environment in DolphinScheduler
If you are using the DataX task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `/dolphinscheduler/conf/env/dolphinscheduler_env.sh`.
If you are using the DataX task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `bin/env/dolphinscheduler_env.sh`.
![datax_task01](/img/tasks/demo/datax_task01.png)

View File

@ -46,13 +46,13 @@ This is a common introductory case in the big data ecosystem, which often apply
#### Configure the flink environment in DolphinScheduler
If you are using the flink task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `/dolphinscheduler/conf/env/dolphinscheduler_env.sh`.
If you are using the flink task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `bin/env/dolphinscheduler_env.sh`.
![demo-flink-simple](/img/tasks/demo/flink_task01.png)
#### Upload the Main Package
When using the Flink task node, you need to upload the jar package to the Resource Centre for the execution, refer to the [resource center](../resource.md).
When using the Flink task node, you need to upload the jar package to the Resource Center for the execution, refer to the [resource center](../resource.md).
After finish the Resource Centre configuration, upload the required target files directly by dragging and dropping.

View File

@ -54,7 +54,7 @@ This example is a common introductory type of MapReduce application, which used
#### Configure the MapReduce Environment in DolphinScheduler
If you are using the MapReduce task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `/dolphinscheduler/conf/env/dolphinscheduler_env.sh`.
If you are using the MapReduce task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `bin/env/dolphinscheduler_env.sh`.
![mr_configure](/img/tasks/demo/mr_task01.png)

View File

@ -45,7 +45,7 @@ This is a common introductory case in the big data ecosystem, which often apply
#### Configure the Spark Environment in DolphinScheduler
If you are using the Spark task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `/dolphinscheduler/conf/env/dolphinscheduler_env.sh`.
If you are using the Spark task type in a production environment, it is necessary to configure the required environment first. The following is the configuration file: `bin/env/dolphinscheduler_env.sh`.
![spark_configure](/img/tasks/demo/spark_task01.png)
@ -65,4 +65,4 @@ Configure the required content according to the parameter descriptions above.
## Notice
JAVA and Scala only used for identification, there is no difference. If use Python to develop Flink, there is no class of the main function and the rest is the same.
JAVA and Scala only used for identification, there is no difference. If you use Python to develop Spark application, there is no class of the main function and the rest is the same.

View File

@ -6,31 +6,24 @@
`sh ./script/stop-all.sh`
## Download the Newest Version Installation Package
## Download the Latest Version Installation Package
- [download](/en-us/download/download.html) the latest version of the installation packages.
- The following upgrade operations need to be performed in the new version's directory.
## Database Upgrade
- Modify the following properties in `conf/datasource.properties`.
- Change `username` and `password` in `./tools/conf/application.yaml` to yours.
- If using MySQL as the database to run DolphinScheduler, please comment out PostgreSQL related configurations, and add MYSQL connector jar into lib dir, here we download `mysql-connector-java-8.0.16.jar`, and then correctly configure database connection information. You can download MYSQL connector jar from [here](https://downloads.MySQL.com/archives/c-j/). Alternatively, if you use PostgreSQL as the database, you just need to comment out Mysql related configurations and correctly configure database connect information.
- If using MySQL as the database to run DolphinScheduler, please config it in `./tools/bin/dolphinscheduler_env.sh`, and add MYSQL connector jar into lib dir `./tools/lib`, here we download `mysql-connector-java-8.0.16.jar`, and then correctly configure database connection information. You can download MYSQL connector jar from [here](https://downloads.MySQL.com/archives/c-j/). Otherwise, PostgreSQL is the default database.
```properties
# postgre
#spring.datasource.driver-class-name=org.postgresql.Driver
#spring.datasource.url=jdbc:postgresql://localhost:5432/dolphinscheduler
# mysql
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://xxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&allowMultiQueries=true
spring.datasource.username=xxx
spring.datasource.password=xxx
```shell
export DATABASE=${DATABASE:-mysql}
```
- Execute database upgrade script:
`sh ./script/upgrade-dolphinscheduler.sh`
`sh ./tools/bin/upgrade-schema.sh`
## Backend Service Upgrade

View File

@ -4,6 +4,10 @@
#### Setup instructions, are available for each stable version of Apache DolphinScheduler below:
### Versions: 3.0.0-alpha
#### Links [3.0.0-alpha Document](../3.0.0/user_doc/about/introduction.md)
### Versions: 2.0.5
#### Links [2.0.5 Document](../2.0.5/user_doc/guide/quick-start.md)

View File

@ -380,21 +380,42 @@ apiServers="ds1"
```
## 11.dolphinscheduler_env.sh [环境变量配置]
通过类似shell方式提交任务的的时候,会加载该配置文件中的环境变量到主机中.
涉及到的任务类型有: Shell任务、Python任务、Spark任务、Flink任务、Datax任务等等
通过类似shell方式提交任务的的时候,会加载该配置文件中的环境变量到主机中. 涉及到的 `JAVA_HOME`、元数据库、注册中心和任务类型配置,其中任务
类型主要有: Shell任务、Python任务、Spark任务、Flink任务、Datax任务等等
```bash
export HADOOP_HOME=/opt/soft/hadoop
export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
export SPARK_HOME1=/opt/soft/spark1
export SPARK_HOME2=/opt/soft/spark2
export PYTHON_HOME=/opt/soft/python
export JAVA_HOME=/opt/soft/java
export HIVE_HOME=/opt/soft/hive
export FLINK_HOME=/opt/soft/flink
export DATAX_HOME=/opt/soft/datax/bin/datax.py
# JAVA_HOME, will use it to start DolphinScheduler server
export JAVA_HOME=${JAVA_HOME:-/opt/soft/java}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$DATAX_HOME:$PATH
# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-postgresql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_DRIVER_CLASS_NAME
export SPRING_DATASOURCE_URL
export SPRING_DATASOURCE_USERNAME
export SPRING_DATASOURCE_PASSWORD
# DolphinScheduler server related configuration
export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}
# Registry center configuration, determines the type and link of the registry center
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-localhost:2181}
# Tasks related configurations, need to change the configuration if you use the related tasks.
export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH
```
## 12.各服务日志配置文件

View File

@ -0,0 +1,102 @@
# API 设计规范
规范统一的 API 是项目设计的基石。DolphinScheduler 的 API 遵循 REST ful 标准REST ful 是目前最流行的一种互联网软件架构,它结构清晰,符合标准,易于理解,扩展方便。
本文以 DolphinScheduler 项目的接口为样例,讲解如何构造具有 Restful 风格的 API。
## 1. URI 设计
REST 即为 Representational State Transfer 的缩写,即“表现层状态转化”。
“表现层”指的就是“资源”。资源对应网络上的一种实体,例如:一段文本,一张图片,一种服务。且每种资源都对应一个特定的 URI。
Restful URI 的设计基于资源:
+ 一类资源:用复数表示,如 `task-instances`、`groups` 等;
+ 单个资源:用单数,或是用 id 值表示某类资源下的一个,如 `group`、`groups/{groupId}`
+ 子资源:某个资源下的资源:`/instances/{instanceId}/tasks`
+ 子资源下的单个资源:`/instances/{instanceId}/tasks/{taskId}`
## 2. Method 设计
我们需要通过 URI 来定位某种资源,再通过 Method或者在路径后缀声明动作来体现对资源的操作。
### ① 查询操作 - GET
通过 URI 来定位要资源,通过 GET 表示查询。
+ 当 URI 为一类资源时表示查询一类资源,例如下面样例表示分页查询 `alter-groups`
```
Method: GET
/api/dolphinscheduler/alert-groups
```
+ 当 URI 为单个资源时表示查询此资源,例如下面样例表示查询对应的 `alter-group`
```
Method: GET
/api/dolphinscheduler/alter-groups/{id}
```
+ 此外,我们还可以根据 URI 来表示查询子资源,如下:
```
Method: GET
/api/dolphinscheduler/projects/{projectId}/tasks
```
**上述的关于查询的方式都表示分页查询,如果我们需要查询全部数据的话,则需在 URI 的后面加 `/list` 来区分。分页查询和查询全部不要混用一个 API。**
```
Method: GET
/api/dolphinscheduler/alert-groups/list
```
### ② 创建操作 - POST
通过 URI 来定位要创建的资源类型,通过 POST 表示创建动作,并且将创建后的 `id` 返回给请求者。
+ 下面样例表示创建一个 `alter-group`
```
Method: POST
/api/dolphinscheduler/alter-groups
```
+ 创建子资源也是类似的操作:
```
Method: POST
/api/dolphinscheduler/alter-groups/{alterGroupId}/tasks
```
### ③ 修改操作 - PUT
通过 URI 来定位某一资源,通过 PUT 指定对其修改。
```
Method: PUT
/api/dolphinscheduler/alter-groups/{alterGroupId}
```
### ④ 删除操作 -DELETE
通过 URI 来定位某一资源,通过 DELETE 指定对其删除。
+ 下面例子表示删除 `alterGroupId` 对应的资源:
```
Method: DELETE
/api/dolphinscheduler/alter-groups/{alterGroupId}
```
+ 批量删除:对传入的 id 数组进行批量删除,使用 POST 方法。**(这里不要用 DELETE 方法,因为 DELETE 请求的 body 在语义上没有任何意义,而且有可能一些网关,代理,防火墙在收到 DELETE 请求后会把请求的 body 直接剥离掉。)**
```
Method: POST
/api/dolphinscheduler/alter-groups/batch-delete
```
### ⑤ 其他操作
除增删改查外的操作,我们同样也通过 `url` 定位到对应的资源,然后再在路径后面追加对其进行的操作。例如:
```
/api/dolphinscheduler/alert-groups/verify-name
/api/dolphinscheduler/projects/{projectCode}/process-instances/{code}/view-gantt
```
## 3. 参数设计
参数分为两种,一种是请求参数(Request Param 或 Request Body),另一种是路径参数(Path Param)。
参数变量必须用小驼峰表示,并且在分页场景中,用户输入的参数小于 1则前端需要返给后端 1 表示请求第一页;当后端发现用户输入的参数大于总页数时,直接返回最后一页。
## 4. 其他设计
### 基础路径
整个项目的 URI 需要以 `/api/<project_name>` 作为基础路径,从而标识这类 API 都是项目下的,即:
```
/api/dolphinscheduler
```

View File

@ -0,0 +1,302 @@
## 系统架构设计
在对调度系统架构说明之前,我们先来认识一下调度系统常用的名词
### 1.名词解释
**DAG** 全称Directed Acyclic Graph简称DAG。工作流中的Task任务以有向无环图的形式组装起来从入度为零的节点进行拓扑遍历直到无后继节点为止。举例如下图
<p align="center">
<img src="/img/architecture-design/dag_examples.png" alt="dag示例" width="80%" />
<p align="center">
<em>dag示例</em>
</p>
</p>
**流程定义**:通过拖拽任务节点并建立任务节点的关联所形成的可视化**DAG**
**流程实例**:流程实例是流程定义的实例化,可以通过手动启动或定时调度生成,流程定义每运行一次,产生一个流程实例
**任务实例**:任务实例是流程定义中任务节点的实例化,标识着具体的任务执行状态
**任务类型** 目前支持有SHELL、SQL、SUB_PROCESS(子流程)、PROCEDURE、MR、SPARK、PYTHON、DEPENDENT(依赖),同时计划支持动态插件扩展,注意:其中子 **SUB_PROCESS** 也是一个单独的流程定义,是可以单独启动执行的
**调度方式:** 系统支持基于cron表达式的定时调度和手动调度。命令类型支持启动工作流、从当前节点开始执行、恢复被容错的工作流、恢复暂停流程、从失败节点开始执行、补数、定时、重跑、暂停、停止、恢复等待线程。其中 **恢复被容错的工作流** 和 **恢复等待线程** 两种命令类型是由调度内部控制使用,外部无法调用
**定时调度**:系统采用 **quartz** 分布式调度器并同时支持cron表达式可视化的生成
**依赖**:系统不单单支持 **DAG** 简单的前驱和后继节点之间的依赖,同时还提供**任务依赖**节点,支持**流程间的自定义任务依赖**
**优先级** :支持流程实例和任务实例的优先级,如果流程实例和任务实例的优先级不设置,则默认是先进先出
**邮件告警**:支持 **SQL任务** 查询结果邮件发送,流程实例运行结果邮件告警及容错告警通知
**失败策略**:对于并行运行的任务,如果有任务失败,提供两种失败策略处理方式,**继续**是指不管并行运行任务的状态,直到流程失败结束。**结束**是指一旦发现失败任务则同时Kill掉正在运行的并行任务流程失败结束
**补数**:补历史数据,支持**区间并行和串行**两种补数方式
### 2.系统架构
#### 2.1 系统架构图
<p align="center">
<img src="/img/architecture.jpg" alt="系统架构图" />
<p align="center">
<em>系统架构图</em>
</p>
</p>
#### 2.2 架构说明
* **MasterServer**
MasterServer采用分布式无中心设计理念MasterServer主要负责 DAG 任务切分、任务提交监控并同时监听其它MasterServer和WorkerServer的健康状态。
MasterServer服务启动时向Zookeeper注册临时节点通过监听Zookeeper临时节点变化来进行容错处理。
##### 该服务内主要包含:
- **Distributed Quartz**分布式调度组件主要负责定时任务的启停操作当quartz调起任务后Master内部会有线程池具体负责处理任务的后续操作
- **MasterSchedulerThread**是一个扫描线程,定时扫描数据库中的 **command** 表,根据不同的**命令类型**进行不同的业务操作
- **MasterExecThread**主要是负责DAG任务切分、任务提交监控、各种不同命令类型的逻辑处理
- **MasterTaskExecThread**主要负责任务的持久化
* **WorkerServer**
WorkerServer也采用分布式无中心设计理念WorkerServer主要负责任务的执行和提供日志服务。WorkerServer服务启动时向Zookeeper注册临时节点并维持心跳。
##### 该服务包含:
- **FetchTaskThread**主要负责不断从**Task Queue**中领取任务,并根据不同任务类型调用**TaskScheduleThread**对应执行器。
* **ZooKeeper**
ZooKeeper服务系统中的MasterServer和WorkerServer节点都通过ZooKeeper来进行集群管理和容错。另外系统还基于ZooKeeper进行事件监听和分布式锁。
我们也曾经基于Redis实现过队列不过我们希望DolphinScheduler依赖到的组件尽量地少所以最后还是去掉了Redis实现。
* **Task Queue**
提供任务队列的操作目前队列也是基于Zookeeper来实现。由于队列中存的信息较少不必担心队列里数据过多的情况实际上我们压测过百万级数据存队列对系统稳定性和性能没影响。
* **Alert**
提供告警相关接口,接口主要包括两种类型的告警数据的存储、查询和通知功能。其中通知功能又有**邮件通知**和**SNMP(暂未实现)**两种。
* **API**
API接口层主要负责处理前端UI层的请求。该服务统一提供RESTful api向外部提供请求服务。
接口包括工作流的创建、定义、查询、修改、发布、下线、手工启动、停止、暂停、恢复、从该节点开始执行等等。
* **UI**
系统的前端页面,提供系统的各种可视化操作界面,详见 [快速开始](https://dolphinscheduler.apache.org/zh-cn/docs/latest/user_doc/about/introduction.html) 部分。
#### 2.3 架构设计思想
##### 一、去中心化vs中心化
###### 中心化思想
中心化的设计理念比较简单,分布式集群中的节点按照角色分工,大体上分为两种角色:
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/master_slave.png" alt="master-slave角色" width="50%" />
</p>
- Master的角色主要负责任务分发并监督Slave的健康状态可以动态的将任务均衡到Slave上以致Slave节点不至于“忙死”或”闲死”的状态。
- Worker的角色主要负责任务的执行工作并维护和Master的心跳以便Master可以分配任务给Slave。
中心化思想设计存在的问题:
- 一旦Master出现了问题则群龙无首整个集群就会崩溃。为了解决这个问题大多数Master/Slave架构模式都采用了主备Master的设计方案可以是热备或者冷备也可以是自动切换或手动切换而且越来越多的新系统都开始具备自动选举切换Master的能力,以提升系统的可用性。
- 另外一个问题是如果Scheduler在Master上虽然可以支持一个DAG中不同的任务运行在不同的机器上但是会产生Master的过负载。如果Scheduler在Slave上则一个DAG中所有的任务都只能在某一台机器上进行作业提交则并行任务比较多的时候Slave的压力可能会比较大。
###### 去中心化
<p align="center"
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/decentralization.png" alt="去中心化" width="50%" />
</p>
- 在去中心化设计里通常没有Master/Slave的概念所有的角色都是一样的地位是平等的全球互联网就是一个典型的去中心化的分布式系统联网的任意节点设备down机都只会影响很小范围的功能。
- 去中心化设计的核心设计在于整个分布式系统中不存在一个区别于其他节点的”管理者”,因此不存在单点故障问题。但由于不存在” 管理者”节点所以每个节点都需要跟其他节点通信才得到必须要的机器信息,而分布式系统通信的不可靠性,则大大增加了上述功能的实现难度。
- 实际上,真正去中心化的分布式系统并不多见。反而动态中心化分布式系统正在不断涌出。在这种架构下,集群中的管理者是被动态选择出来的,而不是预置的,并且集群在发生故障的时候,集群的节点会自发的举行"会议"来选举新的"管理者"去主持工作。最典型的案例就是ZooKeeper及Go语言实现的Etcd。
- DolphinScheduler的去中心化是Master/Worker注册到Zookeeper中实现Master集群和Worker集群无中心并使用Zookeeper分布式锁来选举其中的一台Master或Worker为“管理者”来执行任务。
##### 二、分布式锁实践
DolphinScheduler使用ZooKeeper分布式锁来实现同一时刻只有一台Master执行Scheduler或者只有一台Worker执行任务的提交。
1. 获取分布式锁的核心流程算法如下
<p align="center">
<img src="/img/architecture-design/distributed_lock.png" alt="获取分布式锁流程" width="70%" />
</p>
2. DolphinScheduler中Scheduler线程分布式锁实现流程图
<p align="center">
<img src="/img/architecture-design/distributed_lock_procss.png" alt="获取分布式锁流程" />
</p>
##### 三、线程不足循环等待问题
- 如果一个DAG中没有子流程则如果Command中的数据条数大于线程池设置的阈值则直接流程等待或失败。
- 如果一个大的DAG中嵌套了很多子流程如下图则会产生“死等”状态
<p align="center">
<img src="/img/architecture-design/lack_thread.png" alt="线程不足循环等待问题" width="70%" />
</p>
上图中MainFlowThread等待SubFlowThread1结束SubFlowThread1等待SubFlowThread2结束 SubFlowThread2等待SubFlowThread3结束而SubFlowThread3等待线程池有新线程则整个DAG流程不能结束从而其中的线程也不能释放。这样就形成的子父流程循环等待的状态。此时除非启动新的Master来增加线程来打破这样的”僵局”否则调度集群将不能再使用。
对于启动新Master来打破僵局似乎有点差强人意于是我们提出了以下三种方案来降低这种风险
1. 计算所有Master的线程总和然后对每一个DAG需要计算其需要的线程数也就是在DAG流程执行之前做预计算。因为是多Master线程池所以总线程数不太可能实时获取。
2. 对单Master线程池进行判断如果线程池已经满了则让线程直接失败。
3. 增加一种资源不足的Command类型如果线程池不足则将主流程挂起。这样线程池就有了新的线程可以让资源不足挂起的流程重新唤醒执行。
注意Master Scheduler线程在获取Command的时候是FIFO的方式执行的。
于是我们选择了第三种方式来解决线程不足的问题。
##### 四、容错设计
容错分为服务宕机容错和任务重试服务宕机容错又分为Master容错和Worker容错两种情况
###### 1. 宕机容错
服务容错设计依赖于ZooKeeper的Watcher机制实现原理如图
<p align="center">
<img src="/img/architecture-design/fault-tolerant.png" alt="DolphinScheduler容错设计" width="70%" />
</p>
其中Master监控其他Master和Worker的目录如果监听到remove事件则会根据具体的业务逻辑进行流程实例容错或者任务实例容错。
- Master容错流程图
<p align="center">
<img src="/img/architecture-design/fault-tolerant_master.png" alt="Master容错流程图" width="70%" />
</p>
ZooKeeper Master容错完成之后则重新由DolphinScheduler中Scheduler线程调度遍历 DAG 找到”正在运行”和“提交成功”的任务对”正在运行”的任务监控其任务实例的状态对”提交成功”的任务需要判断Task Queue中是否已经存在如果存在则同样监控任务实例的状态如果不存在则重新提交任务实例。
- Worker容错流程图
<p align="center">
<img src="/img/architecture-design/fault-tolerant_worker.png" alt="Worker容错流程图" width="70%" />
</p>
Master Scheduler线程一旦发现任务实例为” 需要容错”状态,则接管任务并进行重新提交。
注意:由于” 网络抖动”可能会使得节点短时间内失去和ZooKeeper的心跳从而发生节点的remove事件。对于这种情况我们使用最简单的方式那就是节点一旦和ZooKeeper发生超时连接则直接将Master或Worker服务停掉。
###### 2.任务失败重试
这里首先要区分任务失败重试、流程失败恢复、流程失败重跑的概念:
- 任务失败重试是任务级别的是调度系统自动进行的比如一个Shell任务设置重试次数为3次那么在Shell任务运行失败后会自己再最多尝试运行3次
- 流程失败恢复是流程级别的,是手动进行的,恢复是从只能**从失败的节点开始执行**或**从当前节点开始执行**
- 流程失败重跑也是流程级别的,是手动进行的,重跑是从开始节点进行
接下来说正题,我们将工作流中的任务节点分了两种类型。
- 一种是业务节点这种节点都对应一个实际的脚本或者处理语句比如Shell节点MR节点、Spark节点、依赖节点等。
- 还有一种是逻辑节点,这种节点不做实际的脚本或语句处理,只是整个流程流转的逻辑处理,比如子流程节等。
每一个**业务节点**都可以配置失败重试的次数,当该任务节点失败,会自动重试,直到成功或者超过配置的重试次数。**逻辑节点**不支持失败重试。但是逻辑节点里的任务支持重试。
如果工作流中有任务失败达到最大重试次数,工作流就会失败停止,失败的工作流可以手动进行重跑操作或者流程恢复操作
##### 五、任务优先级设计
在早期调度设计中,如果没有优先级设计,采用公平调度设计的话,会遇到先行提交的任务可能会和后继提交的任务同时完成的情况,而不能做到设置流程或者任务的优先级,因此我们对此进行了重新设计,目前我们设计如下:
- 按照**不同流程实例优先级**优先于**同一个流程实例优先级**优先于**同一流程内任务优先级**优先于**同一流程内任务**提交顺序依次从高到低进行任务处理。
- 具体实现是根据任务实例的json解析优先级然后把**流程实例优先级_流程实例id_任务优先级_任务id**信息保存在ZooKeeper任务队列中当从任务队列获取的时候通过字符串比较即可得出最需要优先执行的任务
- 其中流程定义的优先级是考虑到有些流程需要先于其他流程进行处理这个可以在流程启动或者定时启动时配置共有5级依次为HIGHEST、HIGH、MEDIUM、LOW、LOWEST。如下图
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/process_priority.png" alt="流程优先级配置" width="40%" />
</p>
- 任务的优先级也分为5级依次为HIGHEST、HIGH、MEDIUM、LOW、LOWEST。如下图
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/task_priority.png" alt="任务优先级配置" width="35%" />
</p>
##### 六、Logback和gRPC实现日志访问
- 由于Web(UI)和Worker不一定在同一台机器上所以查看日志不能像查询本地文件那样。有两种方案
- 将日志放到ES搜索引擎上
- 通过gRPC通信获取远程日志信息
- 介于考虑到尽可能的DolphinScheduler的轻量级性所以选择了gRPC实现远程访问日志信息。
<p align="center">
<img src="https://analysys.github.io/easyscheduler_docs_cn/images/grpc.png" alt="grpc远程访问" width="60%" />
</p>
- 我们使用自定义Logback的FileAppender和Filter功能实现每个任务实例生成一个日志文件。
- FileAppender主要实现如下
```java
/**
* task log appender
*/
public class TaskLogAppender extends FileAppender<ILoggingEvent> {
...
@Override
protected void append(ILoggingEvent event) {
if (currentlyActiveFile == null){
currentlyActiveFile = getFile();
}
String activeFile = currentlyActiveFile;
// thread name taskThreadName-processDefineId_processInstanceId_taskInstanceId
String threadName = event.getThreadName();
String[] threadNameArr = threadName.split("-");
// logId = processDefineId_processInstanceId_taskInstanceId
String logId = threadNameArr[1];
...
super.subAppend(event);
}
}
```
以/流程定义id/流程实例id/任务实例id.log的形式生成日志
- 过滤匹配以TaskLogInfo开始的线程名称
- TaskLogFilter实现如下
```java
/**
* task log filter
*/
public class TaskLogFilter extends Filter<ILoggingEvent> {
@Override
public FilterReply decide(ILoggingEvent event) {
if (event.getThreadName().startsWith("TaskLogInfo-")){
return FilterReply.ACCEPT;
}
return FilterReply.DENY;
}
}
```
### 总结
本文从调度出发,初步介绍了大数据分布式工作流调度系统--DolphinScheduler的架构原理及实现思路。未完待续

View File

@ -0,0 +1,61 @@
# 全局参数开发文档
用户在定义方向为 OUT 的参数后,会保存在 task 的 localParam 中。
## 参数的使用
从 DAG 中获取当前需要创建的 taskInstance 的直接前置节点 preTasks获取 preTasks 的 varPool将该 `varPool(List<Property>)`合并为一个 varPool在合并过程中如果发现有相同的变量名的变量按照以下逻辑处理
* 若所有的值都是 null则合并后的值为 null
* 若有且只有一个值为非 null则合并后的值为该非 null 值
* 若所有的值都不是 null则根据取 varPool 的 taskInstance 的 endtime 最早的一个
在合并过程中将所有的合并过来的 Property 的方向更新为 IN
合并后的结果保存在 taskInstance.varPool 中。
Worker 收到后将 varPool 解析为 Map<String,Property> 的格式,其中 map 的 key 为 property.prop 也就是变量名。
在 processor 处理参数时,会将 varPool 和 localParam 和 globalParam 三个变量池参数合并,合并过程中若有参数名重复的参数,按照以下优先级进行替换,高优先级保留,低优先级被替换:
* `globalParam` :高
* `varPool` :中
* `localParam` :低
参数会在节点内容执行之前利用正则表达式比配到 ${变量名},替换为对应的值。
## 参数的设置
目前仅支持 SQL 和 SHELL 节点的参数获取。
从 localParam 中获取方向为 OUT 的参数,根据不同节点的类型做以下方式处理。
### SQL 节点
参数返回的结构为 List<Map<String,String>>
其中List 的元素为每行数据Map 的 key 为列名value 为该列对应的值
* 若 SQL 语句返回为有一行数据,则根据用户在定义 task 时定义的 OUT 参数名匹配列名,若没有匹配到则放弃。
* 若 SQL 语句返回多行,按照根据用户在定义 task 时定义的类型为 LIST 的 OUT 参数名匹配列名,将对应列的所有行数据转换为 `List<String>`,作为该参数的值。若没有匹配到则放弃。
### SHELL 节点
processor 执行后的结果返回为 `Map<String,String>`
用户在定义 shell 脚本时需要在输出中定义 `${setValue(key=value)}`
在参数处理时去掉 ${setValue()},按照 “=” 进行拆分,第 0 个为 key第 1 个为 value。
同样匹配用户定义 task 时定义的 OUT 参数名与 key将 value 作为该参数的值。
返回参数处理
* 获取到的 processor 的结果为 String
* 判断 processor 是否为空,为空退出
* 判断 localParam 是否为空,为空退出
* 获取 localParam 中为 OUT 的参数,为空退出
* 将String按照上诉格式格式化SQL为List<Map<String,String>>shell为Map<String,String>
* 将匹配好值的参数赋值给 varPoolList<Property>,其中包含原有 IN 的参数)
varPool 格式化为 json传递给 master。
Master 接收到 varPool 后,将其中为 OUT 的参数回写到 localParam 中。

View File

@ -0,0 +1,6 @@
# 综述
<!-- TODO 由于 side menu 不支持多个等级所以新建了一个leading page存放 -->
* [全局参数](global-parameter.md)
* [switch任务类型](task/switch.md)

View File

@ -0,0 +1,8 @@
# SWITCH 任务类型开发文档
Switch任务类型的工作流程如下
* 用户定义的表达式和分支流转的信息存在了taskdefinition中的taskParams中当switch被执行到时会被格式化为SwitchParameters。
* SwitchTaskExecThread从上到下用户在页面上定义的表达式顺序处理switch中定义的表达式从varPool中获取变量的值通过js解析表达式如果表达式返回true则停止检查并且记录该表达式的顺序这里我们记录为resultConditionLocation。SwitchTaskExecThread的任务便结束了。
* 当switch节点运行结束之后如果没有发生错误较为常见的是用户定义的表达式不合规范或参数名有问题这个时候MasterExecThread.submitPostNode会获取DAG的下游节点继续执行。
* DagHelper.parsePostNodes中如果发现当前节点刚刚运行完成功的节点是switch节点的话会获取resultConditionLocation将SwitchParameters中除了resultConditionLocation以外的其他分支全部skip掉。这样留下来的就只有需要执行的分支了。

View File

@ -0,0 +1,93 @@
### DolphinScheduler Alert SPI 主要设计
#### DolphinScheduler SPI 设计
DolphinScheduler 正在处于微内核 + 插件化的架构更改之中,所有核心能力如任务、资源存储、注册中心等都将被设计为扩展点,我们希望通过 SPI 来提高 DolphinScheduler 本身的灵活性以及友好性(扩展性)。
告警相关代码可以参考 `dolphinscheduler-alert-api` 模块。该模块定义了告警插件扩展的接口以及一些基础代码,当我们需要实现相关功能的插件化的时候,建议先阅读此块的代码,当然,更建议你阅读文档,这会减少很多时间,不过文档有一定的后滞性,当文档缺失的时候,建议以源码为准(如果有兴趣,我们也欢迎你来提交相关文档),此外,我们几乎不会对扩展接口做变更(不包括新增),除非重大架构调整,出现不兼容升级版本,因此,现有文档一般都能够满足。
我们采用了原生的 JAVA-SPI当你需要扩展的时候事实上你只需要关注扩展`org.apache.dolphinscheduler.alert.api.AlertChannelFactory`接口即可,底层相关逻辑如插件加载等内核已经实现,这让我们的开发更加专注且简单。
顺便提一句,我们采用了一款优秀的前端组件 form-create它支持基于 json 生成前端 ui 组件,如果插件开发牵扯到前端,我们会通过 json 来生成相关前端 UI 组件org.apache.dolphinscheduler.spi.params 里面对插件的参数做了封装,它会将相关参数全部全部转化为对应的 json这意味这你完全可以通过 Java 代码的方式完成前端组件的绘制(这里主要是表单,我们只关心前后端交互的数据)。
本文主要着重讲解 Alert 告警相关设计以及开发。
#### 主要模块
如果你并不关心它的内部设计,只是想单纯的了解如何开发自己的告警插件,可以略过该内容。
* dolphinscheduler-alert-api
该模块是 ALERT SPI 的核心模块,该模块定义了告警插件扩展的接口以及一些基础代码,扩展插件必须实现此模块所定义的接口:`org.apache.dolphinscheduler.alert.api.AlertChannelFactory`
* dolphinscheduler-alert-plugins
该模块是目前我们提供的插件,目前我们已经支持数十种插件,如 Email、DingTalk、Script等。
#### Alert SPI 主要类信息:
AlertChannelFactory
告警插件工厂接口所有告警插件需要实现该接口该接口用来定义告警插件的名称需要的参数create 方法用来创建具体的告警插件实例。
AlertChannel
告警插件的接口,告警插件需要实现该接口,该接口中只有一个方法 process ,上层告警系统会调用该方法并通过该方法返回的 AlertResult 来获取告警的返回信息。
AlertData
告警内容信息,包括 id标题内容日志。
AlertInfo
告警相关信息,上层系统调用告警插件实例时,将该类的实例通过 process 方法传入具体的告警插件。内部包含告警内容 AlertData 和调用的告警插件实例的前端填写的参数信息。
AlertResult
告警插件发送告警返回信息。
org.apache.dolphinscheduler.spi.params
该包下是插件化的参数定义,我们前端使用 from-create 这个前端库,该库可以基于插件定义返回的参数列表 json 来动态生成前端的 ui因此我们在做 SPI 插件开发的时候无需关心前端。
该 package 下我们目前只封装了 RadioParamTextParamPasswordParam分别用来定义 text 类型的参数radio 参数和 password 类型的参数。
AbsPluginParams 该类是所有参数的基类RadioParam 这些类都继承了该类。每个 DS 的告警插件都会在 AlertChannelFactory 的实现中返回一个 AbsPluginParams 的 list。
alert_spi 具体设计可见 issue[Alert Plugin Design](https://github.com/apache/incubator-dolphinscheduler/issues/3049)
#### Alert SPI 内置实现
* Email
电子邮件告警通知
* DingTalk
钉钉群聊机器人告警
相关参数配置可以参考钉钉机器人文档。
* EnterpriseWeChat
企业微信告警通知
相关参数配置可以参考企业微信机器人文档。
* Script
我们实现了 Shell 脚本告警,我们会将相关告警参数透传给脚本,你可以在 Shell 中实现你的相关告警逻辑,如果你需要对接内部告警应用,这是一种不错的方法。
* FeiShu
飞书告警通知
* Slack
Slack告警通知
* PagerDuty
PagerDuty告警通知
* WebexTeams
WebexTeams告警通知
相关参数配置可以参考WebexTeams文档。
* Telegram
Telegram告警通知
相关参数配置可以参考Telegram文档。
* Http
我们实现了Http告警调用大部分的告警插件最终都是Http请求如果我们没有支持你常用插件可以使用Http来实现你的告警需求同时也欢迎将你常用插件贡献到社区。

View File

@ -0,0 +1,23 @@
## DolphinScheduler Datasource SPI 主要设计
#### 如何使用数据源?
数据源中心默认支持POSTGRESQL、HIVE/IMPALA、SPARK、CLICKHOUSE、SQLSERVER数据源。
如果使用的是MySQL、ORACLE数据源则需要、把对应的驱动包放置lib目录下
#### 如何进行数据源插件开发?
org.apache.dolphinscheduler.spi.datasource.DataSourceChannel
org.apache.dolphinscheduler.spi.datasource.DataSourceChannelFactory
org.apache.dolphinscheduler.plugin.datasource.api.client.CommonDataSourceClient
1. 第一步数据源插件实现以上接口和继承通用client即可具体可以参考sqlserver、mysql等数据源插件实现所有RDBMS插件的添加方式都是一样的。
2. 在数据源插件pom.xml添加驱动配置
我们在 dolphinscheduler-datasource-api 模块提供了所有数据源对外访问的 API
#### **未来计划**
支持kafka、http、文件、sparkSQL、FlinkSQL等数据源

View File

@ -0,0 +1,26 @@
### DolphinScheduler Registry SPI 扩展
#### 如何使用?
进行以下配置(以 zookeeper 为例)
* 注册中心插件配置, 以Zookeeper 为例 (registry.properties)
dolphinscheduler-service/src/main/resources/registry.properties
```registry.properties
registry.plugin.name=zookeeper
registry.servers=127.0.0.1:2181
```
具体配置信息请参考具体插件提供的参数信息,例如 zk`org/apache/dolphinscheduler/plugin/registry/zookeeper/ZookeeperConfiguration.java`
所有配置信息前缀需要 +registry如 base.sleep.time.ms在 registry 中应该这样配置registry.base.sleep.time.ms=100
#### 如何扩展
`dolphinscheduler-registry-api` 定义了实现插件的标准,当你需要扩展插件的时候只需要实现 `org.apache.dolphinscheduler.registry.api.RegistryFactory` 即可。
`dolphinscheduler-registry-plugin` 模块下是我们目前所提供的注册中心插件。
#### FAQ
1registry connect timeout
可以增加相关超时参数。

View File

@ -0,0 +1,15 @@
## DolphinScheduler Task SPI 扩展
#### 如何进行任务插件开发?
org.apache.dolphinscheduler.spi.task.TaskChannel
插件实现以上接口即可。主要包含创建任务(任务初始化,任务运行等方法)、任务取消,如果是 yarn 任务,则需要实现 org.apache.dolphinscheduler.plugin.task.api.AbstractYarnTask。
我们在 dolphinscheduler-task-api 模块提供了所有任务对外访问的 API而 dolphinscheduler-spi 模块则是 spi 通用代码库,定义了所有的插件模块,比如告警模块,注册中心模块等,你可以详细阅读查看。
*NOTICE*
由于任务插件涉及到前端页面目前前端的SPI还没有实现因此你需要单独实现插件对应的前端页面。
如果任务插件存在类冲突,你可以采用 [Shade-Relocating Classes](https://maven.apache.org/plugins/maven-shade-plugin/) 来解决这种问题。

View File

@ -0,0 +1,155 @@
# DolphinScheduler 开发手册
## 前置条件
在搭建 DolphinScheduler 开发环境之前请确保你已经安装一下软件
* [Git](https://git-scm.com/downloads): 版本控制系统
* [JDK](https://www.oracle.com/technetwork/java/javase/downloads/index.html): 后端开发
* [Maven](http://maven.apache.org/download.cgi): Java包管理系统
* [Node](https://nodejs.org/en/download): 前端开发
### 克隆代码库
通过你 git 管理工具下载 git 代码,下面以 git-core 为例
```shell
mkdir dolphinscheduler
cd dolphinscheduler
git clone git@github.com:apache/dolphinscheduler.git
```
### 编译源码
* 如果使用MySQL数据库请注意修改pom.xml 添加 ` mysql-connector-java ` 依赖。
* 运行 `mvn clean install -Prelease -Dmaven.test.skip=true`
## 开发者须知
DolphinScheduler 开发环境配置有两个方式分别是standalone模式以及普通模式
* [standalone模式](#dolphinscheduler-standalone快速开发模式)**推荐使用,但仅支持 1.3.9 及以后的版本**,方便快速的开发环境搭建,能解决大部分场景的开发
* [普通模式](#dolphinscheduler-普通开发模式)master、worker、api等单独启动能更好的的模拟真实生产环境可以覆盖的测试环境更多
## DolphinScheduler Standalone快速开发模式
> **_注意_** 仅供单机开发调试使用,默认使用 H2 Database,Zookeeper Testing Server
> Standalone 仅在 DolphinScheduler 1.3.9 及以后的版本支持
### 分支选择
开发不同的代码需要基于不同的分支
* 如果想基于二进制包开发,切换到对应版本的代码,如 1.3.9 则是 `1.3.9-release`
* 如果想要开发最新代码,切换到 `dev` 分支
### 启动后端
在 Intellij IDEA 找到并启动类 `org.apache.dolphinscheduler.server.StandaloneServer` 即可完成后端启动
### 启动前端
安装前端依赖并运行前端组件
```shell
cd dolphinscheduler-ui
npm install
npm run start
```
截止目前,前后端已成功运行起来,浏览器访问[http://localhost:8888](http://localhost:8888),并使用默认账户密码 **admin/dolphinscheduler123** 即可完成登录
## DolphinScheduler 普通开发模式
### 必要软件安装
#### zookeeper
下载 [ZooKeeper](https://www.apache.org/dyn/closer.lua/zookeeper/zookeeper-3.6.3),解压
* 在 ZooKeeper 的目录下新建 zkData、zkLog文件夹
* 将 conf 目录下的 `zoo_sample.cfg` 文件,复制一份,重命名为 `zoo.cfg`,修改其中数据和日志的配置,如:
```shell
dataDir=/data/zookeeper/data ## 此处使用绝对路径
dataLogDir=/data/zookeeper/datalog
```
* 运行 `./bin/zkServer.sh`
#### 数据库
DolphinScheduler 的元数据存储在关系型数据库中,目前支持的关系型数据库包括 MySQL 以及 PostgreSQL。下面以MySQL为例启动数据库并创建新 database 作为 DolphinScheduler 元数据库,这里以数据库名 dolphinscheduler 为例
创建完新数据库后,将 `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql/dolphinscheduler_mysql.sql` 下的 sql 文件直接在 MySQL 中运行,完成数据库初始化
#### 启动后端
下面步骤将引导如何启动 DolphinScheduler 后端服务
##### 必要的准备工作
* 打开项目:使用开发工具打开项目,这里以 Intellij IDEA 为例,打开后需要一段时间,让 Intellij IDEA 完成以依赖的下载
* 插件的配置(**仅 2.0 及以后的版本需要**
* 注册中心插件配置, 以Zookeeper 为例 (registry.properties)
dolphinscheduler-service/src/main/resources/registry.properties
```registry.properties
registry.plugin.name=zookeeper
registry.servers=127.0.0.1:2181
```
* 必要的修改
* 如果使用 MySQL 作为元数据库,需要先修改 `dolphinscheduler/pom.xml`,将 `mysql-connector-java` 依赖的 `scope` 改为 `compile`,使用 PostgreSQL 则不需要
* 修改数据库配置,修改 `dolphinscheduler-dao/src/main/resources/application-mysql.yaml` 文件中的数据库配置
本样例以 MySQL 为例,其中数据库名为 dolphinscheduler账户名密码均为 dolphinscheduler
```application-mysql.yaml
spring:
datasource:
driver-class-name: com.mysql.jdbc.Driver
url: jdbc:mysql://127.0.0.1:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
username: ds_user
password: dolphinscheduler
```
* 修改日志级别:为以下配置增加一行内容 `<appender-ref ref="STDOUT"/>` 使日志能在命令行中显示
`dolphinscheduler-server/src/main/resources/logback-worker.xml`
`dolphinscheduler-server/src/main/resources/logback-master.xml`
`dolphinscheduler-api/src/main/resources/logback-api.xml`
修改后的结果如下:
```diff
<root level="INFO">
+ <appender-ref ref="STDOUT"/>
<appender-ref ref="APILOGFILE"/>
<appender-ref ref="SKYWALKING-LOG"/>
</root>
```
##### 启动服务
我们需要启动三个服务,包括 MasterServerWorkerServerApiApplicationServer
* MasterServer在 Intellij IDEA 中执行 `org.apache.dolphinscheduler.server.master.MasterServer` 中的 `main` 方法,并配置 *VM Options* `-Dlogging.config=classpath:logback-master.xml -Ddruid.mysql.usePingMethod=false -Dspring.profiles.active=mysql`
* WorkerServer在 Intellij IDEA 中执行 `org.apache.dolphinscheduler.server.worker.WorkerServer` 中的 `main` 方法,并配置 *VM Options* `-Dlogging.config=classpath:logback-worker.xml -Ddruid.mysql.usePingMethod=false -Dspring.profiles.active=mysql`
* ApiApplicationServer在 Intellij IDEA 中执行 `org.apache.dolphinscheduler.api.ApiApplicationServer` 中的 `main` 方法,并配置 *VM Options* `-Dlogging.config=classpath:logback-api.xml -Dspring.profiles.active=api,mysql`。启动完成可以浏览 Open API 文档,地址为 http://localhost:12345/dolphinscheduler/doc.html
> VM Options `-Dspring.profiles.active=mysql``mysql` 表示指定的配置文件
### 启动前端
安装前端依赖并运行前端组件
```shell
cd dolphinscheduler-ui
npm install
npm run start
```
截止目前,前后端已成功运行起来,浏览器访问[http://localhost:8888](http://localhost:8888),并使用默认账户密码 **admin/dolphinscheduler123** 即可完成登录

View File

@ -0,0 +1,194 @@
# DolphinScheduler — E2E 自动化测试
## 一、前置知识:
### 1、E2E 测试与单元测试的区别
E2E是“End to End”的缩写可以翻译成“端到端”测试。它模仿用户从某个入口开始逐步执行操作直到完成某项工作。与单元测试不同后者通常需要测试参数、参数类型、参数值、参数数量、返回值、抛出错误等目的在于保证特定函数能够在任何情况下都稳定可靠完成工作。单元测试假定只要所有函数都正常工作那么整个产品就能正常工作。
相对来说E2E 测试并没有那么强调要覆盖全部使用场景,它关注的**一个完整的操作链是否能够完成**。对于 Web 前端来说,还关注**界面布局、内容信息是否符合预期**。
比如,登陆界面的 E2E 测试,关注用户是否能够正常输入,正常登录;登陆失败的话,是否能够正确显示错误信息。至于输入不合法的内容是否处理,并不是所关注的重点。
### 2、Selenium 测试框架
[Selenium](https://www.selenium.dev) 是一种开源测试工具,用于在 Web 浏览器上执行自动化测试。该框架使用 WebDriver 通过浏览器的原生组件,转化 Web Service 的命令为浏览器 native 的调用来完成操作。简单来说,就是模拟浏览器,对于页面的元素进行选择操作。
WebDriver 是一个 API 和协议,它定义了一个语言中立的接口,用于控制 web 浏览器的行为。 每个浏览器都有一个特定的 WebDriver 实现,称为驱动程序。驱动程序是负责委派给浏览器的组件,并处理与 Selenium 和浏览器之间的通信。
Selenium 框架通过一个面向用户的界面将所有这些部分连接在一起, 该界面允许透明地使用不同的浏览器后端, 从而实现跨浏览器和跨平台自动化。
## 二、E2E 测试
### 1、E2E-Pages
DolphinScheduler 的 E2E 测试使用 docker-compose 部署,当前测试的为单机模式,主要用于检验一些例如“增删改查”基本功能,后期如需做集群验证,例如不同服务之间的协作,或者各个服务之间的通讯机制,可参考 `deploy/docker/docker-compose.yml`来配置。
对于 E2E 测试(前端这一块),使用 [页面模型](https://www.selenium.dev/documentation/guidelines/page_object_models/) 的形式,主要为每一个页面建立一个对应的模型。下面以登录页为例:
```java
package org.apache.dolphinscheduler.e2e.pages;
import org.apache.dolphinscheduler.e2e.pages.common.NavBarPage;
import org.apache.dolphinscheduler.e2e.pages.security.TenantPage;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.remote.RemoteWebDriver;
import org.openqa.selenium.support.FindBy;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;
import lombok.Getter;
import lombok.SneakyThrows;
@Getter
public final class LoginPage extends NavBarPage {
@FindBy(id = "inputUsername")
private WebElement inputUsername;
@FindBy(id = "inputPassword")
private WebElement inputPassword;
@FindBy(id = "btnLogin")
private WebElement buttonLogin;
public LoginPage(RemoteWebDriver driver) {
super(driver);
}
@SneakyThrows
public TenantPage login(String username, String password) {
inputUsername().sendKeys(username);
inputPassword().sendKeys(password);
buttonLogin().click();
new WebDriverWait(driver, 10)
.until(ExpectedConditions.urlContains("/#/security"));
return new TenantPage(driver);
}
}
```
在测试过程中,我们只针对所需要关注的元素进行测试,而非页面中的所有元素,所以在登陆页面只对用户名、密码和登录按钮这些元素进行声明。通过 Selenium 测试框架所提供的 FindBy 接口来查找 Vue 文件中对应的 id 或 class。
此外,在测试过程中,并不会直接去操作元素,一般选择封装对应的方法,以达到复用的效果。例如想要登录的话,直接传入用户名和密码,通过 `public TenantPage login()` 方法去操作所传入的元素,从而达到实现登录的效果,即当用户完成登录之后,跳转到安全中心(默认进入到租户管理页面)。
在安全中心页面SecurityPage提供了 goToTab 方法用于测试对应侧栏的跳转主要包括租户管理TenantPage、用户管理UserPage、工作组管理WorkerGroupPge和队列管理QueuePage。这些页面的实现方式同理主要测试表单的输入、增加和删除按钮是否能够返回出对应的页面。
```java
public <T extends SecurityPage.Tab> T goToTab(Class<T> tab) {
if (tab == TenantPage.class) {
WebElement menuTenantManageElement = new WebDriverWait(driver, 60)
.until(ExpectedConditions.elementToBeClickable(menuTenantManage));
((JavascriptExecutor)driver).executeScript("arguments[0].click();", menuTenantManageElement);
return tab.cast(new TenantPage(driver));
}
if (tab == UserPage.class) {
WebElement menUserManageElement = new WebDriverWait(driver, 60)
.until(ExpectedConditions.elementToBeClickable(menUserManage));
((JavascriptExecutor)driver).executeScript("arguments[0].click();", menUserManageElement);
return tab.cast(new UserPage(driver));
}
if (tab == WorkerGroupPage.class) {
WebElement menWorkerGroupManageElement = new WebDriverWait(driver, 60)
.until(ExpectedConditions.elementToBeClickable(menWorkerGroupManage));
((JavascriptExecutor)driver).executeScript("arguments[0].click();", menWorkerGroupManageElement);
return tab.cast(new WorkerGroupPage(driver));
}
if (tab == QueuePage.class) {
menuQueueManage().click();
return tab.cast(new QueuePage(driver));
}
throw new UnsupportedOperationException("Unknown tab: " + tab.getName());
}
```
![SecurityPage](/img/e2e-test/SecurityPage.png)
对于导航栏选项的跳转,在`org/apache/dolphinscheduler/e2e/pages/common/NavBarPage.java` 中提供了 goToNav 的方法。当前支持的页面为项目管理ProjectPage、安全中心SecurityPage和资源中心ResourcePage
```java
public <T extends NavBarItem> T goToNav(Class<T> nav) {
if (nav == ProjectPage.class) {
WebElement projectTabElement = new WebDriverWait(driver, 60)
.until(ExpectedConditions.elementToBeClickable(projectTab));
((JavascriptExecutor)driver).executeScript("arguments[0].click();", projectTabElement);
return nav.cast(new ProjectPage(driver));
}
if (nav == SecurityPage.class) {
WebElement securityTabElement = new WebDriverWait(driver, 60)
.until(ExpectedConditions.elementToBeClickable(securityTab));
((JavascriptExecutor)driver).executeScript("arguments[0].click();", securityTabElement);
return nav.cast(new SecurityPage(driver));
}
if (nav == ResourcePage.class) {
WebElement resourceTabElement = new WebDriverWait(driver, 60)
.until(ExpectedConditions.elementToBeClickable(resourceTab));
((JavascriptExecutor)driver).executeScript("arguments[0].click();", resourceTabElement);
return nav.cast(new ResourcePage(driver));
}
throw new UnsupportedOperationException("Unknown nav bar");
}
```
### 2、E2E-Cases
当前所支持的 E2E 测试案例主要包括文件管理、项目管理、队列管理、租户管理、用户管理、Worker 分组管理和工作流测试。
![E2E_Cases](/img/e2e-test/E2E_Cases.png)
下面以租户管理测试为例,前文已经说明,我们使用 docker-compose 进行部署,所以每个测试案例,都需要以注解的形式引入对应的文件。
使用 Selenium 所提供的 RemoteWebDriver 来加载浏览器。在每个测试案例开始之前都需要进行一些准备工作。比如:登录用户、跳转到对应的页面(根据具体的测试案例而定)。
```java
@BeforeAll
public static void setup() {
new LoginPage(browser)
.login("admin", "dolphinscheduler123") // 登录进入租户界面
.goToNav(SecurityPage.class) // 安全中心
.goToTab(TenantPage.class)
;
}
```
在完成准备工作之后,就是正式的测试案例编写。我们使用 @Order() 注解的形式,用于模块化,确认测试顺序。在进行测试之后,使用断言来判断测试是否成功,如果断言返回 true则表示创建租户成功。可参考创建租户的测试代码
```java
@Test
@Order(10)
void testCreateTenant() {
final TenantPage page = new TenantPage(browser);
page.create(tenant);
await().untilAsserted(() -> assertThat(page.tenantList())
.as("Tenant list should contain newly-created tenant")
.extracting(WebElement::getText)
.anyMatch(it -> it.contains(tenant)));
}
```
其余的都是类似的情况,可参考具体的源码来理解。
https://github.com/apache/dolphinscheduler/tree/dev/dolphinscheduler-e2e/dolphinscheduler-e2e-case/src/test/java/org/apache/dolphinscheduler/e2e/cases
## 三、补充
在本地运行的时候,首先需要启动相应的本地服务,可以参考该页面: [环境搭建](https://dolphinscheduler.apache.org/zh-cn/development/development-environment-setup.html)
在本地运行 E2E 测试的时候,可以配置 `-Dlocal=true` 参数,用于连接本地,方便对于 UI 界面的更改。
如果是`M1`芯片的机器,可以使用`-Dm1_chip=true` 参数,用于配置使用`ARM64`支持的容器。
![Dlocal](/img/e2e-test/Dlocal.png)
在本地运行过程中,如果出现连接超时,可增大加载时间,建议 30 及其以上。
![timeout](/img/e2e-test/timeout.png)
测试的运行过程将会以 MP4 的文件格式存在。
![MP4](/img/e2e-test/MP4.png)

View File

@ -0,0 +1,639 @@
# 前端开发文档
### 技术选型
```
Vue mvvm 框架
Es6 ECMAScript 6.0
Ans-ui Analysys-ui
D3 可视化库图表库
Jsplumb 连线插件库
Lodash 高性能的 JavaScript 实用工具库
```
### 开发环境搭建
- #### Node安装
Node包下载 (注意版本 v12.20.2) `https://nodejs.org/download/release/v12.20.2/`
- #### 前端项目构建
用命令行模式 `cd` 进入 `dolphinscheduler-ui`项目目录并执行 `npm install` 拉取项目依赖包
> 如果 `npm install` 速度非常慢,你可以设置淘宝镜像
```
npm config set registry http://registry.npm.taobao.org/
```
- 修改 `dolphinscheduler-ui/.env` 文件中的 `API_BASE`,用于跟后端交互:
```
# 代理的接口地址(自行修改)
API_BASE = http://127.0.0.1:12345
```
> ##### !!!这里特别注意 项目如果在拉取依赖包的过程中报 " node-sass error " 错误,请在执行完后再次执行以下命令
```bash
npm install node-sass --unsafe-perm #单独安装node-sass依赖
```
- #### 开发环境运行
- `npm start` 项目开发环境 (启动后访问地址 http://localhost:8888)
#### 前端项目发布
- `npm run build` 项目打包 (打包后根目录会创建一个名为dist文件夹用于发布线上Nginx)
运行 `npm run build` 命令生成打包文件dist
再拷贝到服务器对应的目录下(前端服务静态页面存放目录)
访问地址 `http://localhost:8888`
#### Linux下使用node启动并且守护进程
安装pm2 `npm install -g pm2`
在项目`dolphinscheduler-ui`根目录执行 `pm2 start npm -- run dev` 启动项目
#### 命令
- 启用 `pm2 start npm -- run dev`
- 停止 `pm2 stop npm`
- 删除 `pm2 delete npm`
- 状态 `pm2 list`
```
[root@localhost dolphinscheduler-ui]# pm2 start npm -- run dev
[PM2] Applying action restartProcessId on app [npm](ids: 0)
[PM2] [npm](0) ✓
[PM2] Process successfully started
┌──────────┬────┬─────────┬──────┬──────┬────────┬─────────┬────────┬─────┬──────────┬──────┬──────────┐
│ App name │ id │ version │ mode │ pid │ status │ restart │ uptime │ cpu │ mem │ user │ watching │
├──────────┼────┼─────────┼──────┼──────┼────────┼─────────┼────────┼─────┼──────────┼──────┼──────────┤
│ npm │ 0 │ N/A │ fork │ 6168 │ online │ 31 │ 0s │ 0% │ 5.6 MB │ root │ disabled │
└──────────┴────┴─────────┴──────┴──────┴────────┴─────────┴────────┴─────┴──────────┴──────┴──────────┘
Use `pm2 show <id|name>` to get more details about an app
```
### 项目目录结构
`build` 打包及开发环境项目的一些webpack配置
`node_modules` 开发环境node依赖包
`src` 项目所需文件
`src => combo` 项目第三方资源本地化 `npm run combo`具体查看`build/combo.js`
`src => font` 字体图标库可访问 `https://www.iconfont.cn` 进行添加 注意:字体库用的自己的 二次开发需要重新引入自己的库 `src/sass/common/_font.scss`
`src => images` 公共图片存放
`src => js` js/vue
`src => lib` 公司内部组件(公司组件库开源后可删掉)
`src => sass` sass文件 一个页面对应一个sass文件
`src => view` 页面文件 一个页面对应一个html文件
```
> 项目采用vue单页面应用(SPA)开发
- 所有页面入口文件在 `src/js/conf/${对应页面文件名 => home}``index.js` 入口文件
- 对应的sass文件则在 `src/sass/conf/${对应页面文件名 => home}/index.scss`
- 对应的html文件则在 `src/view/${对应页面文件名 => home}/index.html`
```
公共模块及util `src/js/module`
`components` => 内部项目公共组件
`download` => 下载组件
`echarts` => 图表组件
`filter` => 过滤器和vue管道
`i18n` => 国际化
`io` => io请求封装 基于axios
`mixin` => vue mixin 公共部分 用于disabled操作
`permissions` => 权限操作
`util` => 工具
### 系统功能模块
首页 => `http://localhost:8888/#/home`
项目管理 => `http://localhost:8888/#/projects/list`
```
| 项目首页
| 工作流
- 工作流定义
- 工作流实例
- 任务实例
```
资源管理 => `http://localhost:8888/#/resource/file`
```
| 文件管理
| UDF管理
- 资源管理
- 函数管理
```
数据源管理 => `http://localhost:8888/#/datasource/list`
安全中心 => `http://localhost:8888/#/security/tenant`
```
| 租户管理
| 用户管理
| 告警组管理
- master
- worker
```
用户中心 => `http://localhost:8888/#/user/account`
## 路由和状态管理
项目 `src/js/conf/home` 下分为
`pages` => 路由指向页面目录
```
路由地址对应的页面文件
```
`router` => 路由管理
```
vue的路由器在每个页面的入口文件index.js 都会注册进来 具体操作https://router.vuejs.org/zh/
```
`store` => 状态管理
```
每个路由对应的页面都有一个状态管理的文件 分为:
actions => mapActions => 详情https://vuex.vuejs.org/zh/guide/actions.html
getters => mapGetters => 详情https://vuex.vuejs.org/zh/guide/getters.html
index => 入口
mutations => mapMutations => 详情https://vuex.vuejs.org/zh/guide/mutations.html
state => mapState => 详情https://vuex.vuejs.org/zh/guide/state.html
具体操作https://vuex.vuejs.org/zh/
```
## 规范
## Vue规范
##### 1.组件名
组件名为多个单词,并且用连接线(-)连接,避免与 HTML 标签冲突,并且结构更加清晰。
```
// 正例
export default {
name: 'page-article-item'
}
```
##### 2.组件文件
`src/js/module/components`项目内部公共组件书写文件夹名与文件名同名,公共组件内部所拆分的子组件与util工具都放置组件内部 `_source`文件夹里。
```
└── components
├── header
├── header.vue
└── _source
└── nav.vue
└── util.js
├── conditions
├── conditions.vue
└── _source
└── search.vue
└── util.js
```
##### 3.Prop
定义 Prop 的时候应该始终以驼峰格式camelCase命名在父组件赋值的时候使用连接线-)。
这里遵循每个语言的特性,因为在 HTML 标记中对大小写是不敏感的,使用连接线更加友好;而在 JavaScript 中更自然的是驼峰命名。
```
// Vue
props: {
articleStatus: Boolean
}
// HTML
<article-item :article-status="true"></article-item>
```
Prop 的定义应该尽量详细的指定其类型、默认值和验证。
示例:
```
props: {
attrM: Number,
attrA: {
type: String,
required: true
},
attrZ: {
type: Object,
// 数组/对象的默认值应该由一个工厂函数返回
default: function () {
return {
msg: '成就你我'
}
}
},
attrE: {
type: String,
validator: function (v) {
return !(['success', 'fail'].indexOf(v) === -1)
}
}
}
```
##### 4.v-for
在执行 v-for 遍历的时候,总是应该带上 key 值使更新 DOM 时渲染效率更高。
```
<ul>
<li v-for="item in list" :key="item.id">
{{ item.title }}
</li>
</ul>
```
v-for 应该避免与 v-if 在同一个元素(`例如:<li>`)上使用,因为 v-for 的优先级比 v-if 更高,为了避免无效计算和渲染,应该尽量将 v-if 放到容器的父元素之上。
```
<ul v-if="showList">
<li v-for="item in list" :key="item.id">
{{ item.title }}
</li>
</ul>
```
##### 5.v-if / v-else-if / v-else
若同一组 v-if 逻辑控制中的元素逻辑相同Vue 为了更高效的元素切换,会复用相同的部分,`例如value`。为了避免复用带来的不合理效果,应该在同种元素上加上 key 做标识。
```
<div v-if="hasData" key="mazey-data">
<span>{{ mazeyData }}</span>
</div>
<div v-else key="mazey-none">
<span>无数据</span>
</div>
```
##### 6.指令缩写
为了统一规范始终使用指令缩写,使用`v-bind``v-on`并没有什么不好,这里仅为了统一规范。
```
<input :value="mazeyUser" @click="verifyUser">
```
##### 7.单文件组件的顶级元素顺序
样式后续都是打包在一个文件里所有在单个vue文件中定义的样式在别的文件里同类名的样式也是会生效的所有在创建一个组件前都会有个顶级类名
注意项目内已经增加了sass插件单个vue文件里可以直接书写sass语法
为了统一和便于阅读,应该按 `<template>`、`<script>`、`<style>`的顺序放置。
```
<template>
<div class="test-model">
test
</div>
</template>
<script>
export default {
name: "test",
data() {
return {}
},
props: {},
methods: {},
watch: {},
beforeCreate() {
},
created() {
},
beforeMount() {
},
mounted() {
},
beforeUpdate() {
},
updated() {
},
beforeDestroy() {
},
destroyed() {
},
computed: {},
components: {},
}
</script>
<style lang="scss" rel="stylesheet/scss">
.test-model {
}
</style>
```
## JavaScript规范
##### 1.var / let / const
建议不再使用 var而使用 let / const优先使用 const。任何一个变量的使用都要提前申明除了 function 定义的函数可以随便放在任何位置。
##### 2.引号
```
const foo = '后除'
const bar = `${foo},前端工程师`
```
##### 3.函数
匿名函数统一使用箭头函数,多个参数/返回值时优先使用对象的结构赋值。
```
function getPersonInfo ({name, sex}) {
// ...
return {name, gender}
}
```
函数名统一使用驼峰命名,以大写字母开头申明的都是构造函数,使用小写字母开头的都是普通函数,也不该使用 new 操作符去操作普通函数。
##### 4.对象
```
const foo = {a: 0, b: 1}
const bar = JSON.parse(JSON.stringify(foo))
const foo = {a: 0, b: 1}
const bar = {...foo, c: 2}
const foo = {a: 3}
Object.assign(foo, {b: 4})
const myMap = new Map([])
for (let [key, value] of myMap.entries()) {
// ...
}
```
##### 5.模块
统一使用 import / export 的方式管理项目的模块。
```
// lib.js
export default {}
// app.js
import app from './lib'
```
import 统一放在文件顶部。
如果模块只有一个输出值,使用 `export default`,否则不用。
## HTML / CSS
###### 1.标签
在引用外部 CSS 或 JavaScript 时不写 type 属性。HTML5 默认 type 为 `text/css``text/javascript` 属性,所以没必要指定。
```
<link rel="stylesheet" href="//www.test.com/css/test.css">
<script src="//www.test.com/js/test.js"></script>
```
##### 2.命名
Class 和 ID 的命名应该语义化,通过看名字就知道是干嘛的;多个单词用连接线 - 连接。
```
// 正例
.test-header{
font-size: 20px;
}
```
##### 3.属性缩写
CSS 属性尽量使用缩写,提高代码的效率和方便理解。
```
// 反例
border-width: 1px;
border-style: solid;
border-color: #ccc;
// 正例
border: 1px solid #ccc;
```
##### 4.文档类型
应该总是使用 HTML5 标准。
```
<!DOCTYPE html>
```
##### 5.注释
应该给一个模块文件写一个区块注释。
```
/**
* @module mazey/api
* @author Mazey <mazey@mazey.net>
* @description test.
* */
```
## 接口
##### 所有的接口都以 Promise 形式返回
注意非0都为错误走catch
```
const test = () => {
return new Promise((resolve, reject) => {
resolve({
a:1
})
})
}
// 调用
test.then(res => {
console.log(res)
// {a:1}
})
```
正常返回
```
{
code:0,
data:{}
msg:'成功'
}
```
错误返回
```
{
code:10000,
data:{}
msg:'失败'
}
```
接口如果是post请求Content-Type默认为application/x-www-form-urlencoded如果Content-Type改成application/json
接口传参需要改成下面的方式
```
io.post('url', payload, null, null, { emulateJSON: false } res => {
resolve(res)
}).catch(e => {
reject(e)
})
```
##### 相关接口路径
dag 相关接口 `src/js/conf/home/store/dag/actions.js`
数据源中心 相关接口 `src/js/conf/home/store/datasource/actions.js`
项目管理 相关接口 `src/js/conf/home/store/projects/actions.js`
资源中心 相关接口 `src/js/conf/home/store/resource/actions.js`
安全中心 相关接口 `src/js/conf/home/store/security/actions.js`
用户中心 相关接口 `src/js/conf/home/store/user/actions.js`
## 扩展开发
##### 1.增加节点
(1) 先将节点的icon小图标放置`src/js/conf/home/pages/dag/img`文件夹内,注意 `toolbar_${后台定义的节点的英文名称 例如:SHELL}.png`
(2) 找到 `src/js/conf/home/pages/dag/_source/config.js` 里的 `tasksType` 对象,往里增加
```
'DEPENDENT': { // 后台定义节点类型英文名称用作key值
desc: 'DEPENDENT', // tooltip desc
color: '#2FBFD8' // 代表的颜色主要用于 tree和gantt 两张图
}
```
(3) 在 `src/js/conf/home/pages/dag/_source/formModel/tasks` 增加一个 `${节点类型(小写)}`.vue 文件,跟当前节点相关的组件内容都在这里写。 属于节点组件内的必须拥有一个函数 `_verification()` 验证成功后将当前组件的相关数据往父组件抛。
```
/**
* 验证
*/
_verification () {
// datasource 子组件验证
if (!this.$refs.refDs._verifDatasource()) {
return false
}
// 验证函数
if (!this.method) {
this.$message.warning(`${i18n.$t('请输入方法')}`)
return false
}
// localParams 子组件验证
if (!this.$refs.refLocalParams._verifProp()) {
return false
}
// 存储
this.$emit('on-params', {
type: this.type,
datasource: this.datasource,
method: this.method,
localParams: this.localParams
})
return true
}
```
(4) 节点组件内部所用到公共的组件都在`_source`下,`commcon.js`用于配置公共数据
##### 2.增加状态类型
(1) 找到 `src/js/conf/home/pages/dag/_source/config.js` 里的 `tasksState` 对象,往里增加
```
'WAITTING_DEPEND': { //后端定义状态类型 前端用作key值
id: 11, // 前端定义id 后续用作排序
desc: `${i18n.$t('等待依赖')}`, // tooltip desc
color: '#5101be', // 代表的颜色主要用于 tree和gantt 两张图
icoUnicode: '&#xe68c;', // 字体图标
isSpin: false // 是否旋转(需代码判断)
}
```
##### 3.增加操作栏工具
(1) 找到 `src/js/conf/home/pages/dag/_source/config.js` 里的 `toolOper` 对象,往里增加
```
{
code: 'pointer', // 工具标识
icon: '&#xe781;', // 工具图标
disable: disable, // 是否禁用
desc: `${i18n.$t('拖动节点和选中项')}` // tooltip desc
}
```
(2) 工具类都以一个构造函数返回 `src/js/conf/home/pages/dag/_source/plugIn`
`downChart.js` => dag 图片下载处理
`dragZoom.js` => 鼠标缩放效果处理
`jsPlumbHandle.js` => 拖拽线条处理
`util.js` => 属于 `plugIn` 工具类
操作则在 `src/js/conf/home/pages/dag/_source/dag.js` => `toolbarEvent` 事件中处理。
##### 3.增加一个路由页面
(1) 首先在路由管理增加一个路由地址`src/js/conf/home/router/index.js`
```
{
path: '/test', // 路由地址
name: 'test', // 别名
component: resolve => require(['../pages/test/index'], resolve), // 路由对应组件入口文件
meta: {
title: `${i18n.$t('test')} - DolphinScheduler` // title 显示
}
},
```
(2) 在`src/js/conf/home/pages` 建一个 `test` 文件夹,在文件夹里建一个`index.vue`入口文件。
这样就可以直接访问 `http://localhost:8888/#/test`
##### 4.增加预置邮箱
找到`src/lib/localData/email.js`启动和定时邮箱地址输入可以自动下拉匹配。
```
export default ["test@analysys.com.cn","test1@analysys.com.cn","test3@analysys.com.cn"]
```
##### 5.权限管理及disabled状态处理
权限根据后端接口`getUserInfo`接口给出`userType: "ADMIN_USER/GENERAL_USER"`权限控制页面操作按钮是否`disabled`
具体操作:`src/js/module/permissions/index.js`
disabled处理`src/js/module/mixin/disabledState.js`

View File

@ -0,0 +1,65 @@
# 当你遇到问题时
## StackOverflow
如果在使用上有疑问建议你使用StackOverflow标签 [apache-dolphinscheduler](https://stackoverflow.com/questions/tagged/apache-dolphinscheduler)这是一个DolphinScheduler用户问答的活跃论坛。
使用StackOverflow时的快速提示
- 在提交问题之前:
- 在StackOverflow的 [apache-dolphinscheduler](https://stackoverflow.com/questions/tagged/apache-dolphinscheduler) 标签下进行搜索,看看你的问题是否已经被回答。
- 请遵守StackOverflow的[行为准则](https://stackoverflow.com/help/how-to-ask)
- 提出问题时请务必使用apache-dolphinscheduler标签。
- 请不要在 [StackOverflow](https://stackoverflow.com/questions/tagged/apache-dolphinscheduler) 和 [GitHub issues](https://github.com/apache/dolphinscheduler/issues/new/choose)之间交叉发帖。
提问模板:
> **Describe the question**
>
> 对问题的内容进行清晰、简明的描述。
>
> **Which version of DolphinScheduler:**
>
> -[1.3.0-preview]
>
> **Additional context**
>
> 在此添加关于该问题的其他背景。
>
> **Requirement or improvement**
>
> 在此描述您的要求或改进建议。
如果你的问题较为宽泛、有意见或建议、期望请求外部资源或是有项目调试、bug提交等相关问题或者想要对项目做出贡献、对场景进行讨论建议你提交[ GitHub issues ](https://github.com/apache/dolphinscheduler/issues/new/choose)或使用dev@dolphinscheduler.apache.org 邮件列表进行讨论。
## 邮件列表
- [dev@dolphinscheduler.apache.org](https://lists.apache.org/list.html?dev@dolphinscheduler.apache.org) 是为那些想为DolphinScheduler贡献代码的人准备的。 [(订阅)](mailto:dev-subscribe@dolphinscheduler.apache.org?subject=(send%20this%20email%20to%20subscribe)) [(退订)](mailto:dev-unsubscribe@dolphinscheduler.apache.org?subject=(send%20this%20email%20to%20unsubscribe)) [(存档)](http://lists.apache.org/list.html?dev@dolphinscheduler.apache.org)
使用电子邮件时的一些快速提示:
- 在提出问题之前:
- 请在StackOverflow的 [apache-dolphinscheduler](https://stackoverflow.com/questions/tagged/apache-dolphinscheduler) 标签下进行搜索,看看你的问题是否已经被回答。
- 在你的邮件的主题栏里加上标签会帮助你得到更快的回应,例如:[ApiServer]如何获得开放的api接口
- 可以通过以下标签定义你的主题。
- 组件相关MasterServer、ApiServer、WorkerServer、AlertServer等等。
- 级别Beginner、Intermediate、Advanced
- 场景相关Debug,、How-to
- 如果内容包括错误日志或长代码,请使用 [GitHub gist](https://gist.github.com/),并在邮件中只附加相关代码/日志的几行。
## Chat Rooms
聊天室是快速提问或讨论具体话题的好地方。
以下聊天室是Apache DolphinScheduler的正式组成部分
Slack工作区的网址http://asf-dolphinscheduler.slack.com/
你可以通过该邀请链接加入https://s.apache.org/dolphinscheduler-slack
此聊天室用于与DolphinScheduler使用相关的问题讨论。

View File

@ -203,7 +203,7 @@ A 1在 **流程定义列表**,点击 **启动** 按钮
## QPython 任务设置 Python 版本
A 只需要修改 conf/env/dolphinscheduler_env.sh 中的 PYTHON_HOME
A 只需要修改 `bin/env/dolphinscheduler_env.sh` 中的 PYTHON_HOME
```
export PYTHON_HOME=/bin/python
@ -523,18 +523,6 @@ A1edit /etc/nginx/conf.d/escheduler.conf
---
## Q欢迎订阅 DolphinScheduler 开发邮件列表
A在使用 DolphinScheduler 的过程中,如果您有任何问题或者想法、建议,都可以通过 Apache 邮件列表参与到 DolphinScheduler 的社区建设中来。
发送订阅邮件也非常简单,步骤如下:
1用自己的邮箱向 dev-subscribe@dolphinscheduler.apache.org 发送一封邮件,主题和内容任意。
2 接收确认邮件并回复。 完成步骤1后您将收到一封来自 dev-help@dolphinscheduler.apache.org 的确认邮件(如未收到,请确认邮件是否被自动归入垃圾邮件、推广邮件、订阅邮件等文件夹)。然后直接回复该邮件,或点击邮件里的链接快捷回复即可,主题和内容任意。
3 接收欢迎邮件。 完成以上步骤后,您会收到一封主题为 WELCOME to dev@dolphinscheduler.apache.org 的欢迎邮件,至此您已成功订阅 Apache DolphinScheduler的邮件列表。
---
## Q工作流依赖
A1目前是按照自然天来判断上月末判断时间是工作流 A start_time/scheduler_time between '2019-05-31 00:00:00' and '2019-05-31 23:59:59'。上月:是判断上个月从 1 号到月末每天都要有完成的A实例。上周 上周 7 天都要有完成的 A 实例。前两天: 判断昨天和前天,两天都要有完成的 A 实例。

View File

@ -1,6 +1,10 @@
## 如何创建告警插件以及告警组
在2.0.0版本中,用户需要创建告警实例,然后同告警组进行关联,一个告警组可以使用多个告警实例,我们会逐一进行进行告警通知。
在2.0.0版本中,用户需要创建告警实例,在创建告警实例时,需要选择告警策略,有三个选项,成功发、失败发,以及成功和失败都发。在执行完工作流或任务时,如果触发告警,调用告警实例发送方法会进行逻辑判断,将告警实例与任务状态进行匹配,匹配则执行该告警实例发送逻辑,不匹配则过滤。创建完告警实例后,需要同告警组进行关联,一个告警组可以使用多个告警实例。
告警模块支持场景如下:
<img src="/img/alert/alert_scenarios_zh.png">
使用步骤如下:
首先需要进入到安全中心,选择告警组管理,然后点击左侧的告警实例管理,然后创建一个告警实例,然后选择对应的告警插件,填写相关告警参数。
@ -9,4 +13,4 @@
<img src="/img/alert/alert_step_1.png">
<img src="/img/alert/alert_step_2.png">
<img src="/img/alert/alert_step_3.png">
<img src="/img/alert/alert_step_4.png">
<img src="/img/alert/alert_step_4.png">

View File

@ -1,13 +1,69 @@
# 企业微信
如果您需要使用到企业微信进行告警,请在告警实例管理里创建告警实例,选择 WeChat 插件。企业微信的配置样例如下
如果您需要使用到企业微信进行告警,请在告警实例管理里创建告警实例,选择 WeChat 插件。企业微信的配置样例如下
![enterprise-wechat-plugin](/img/alert/enterprise-wechat-plugin.png)
其中 send.type 分别对应企微文档:
## 发送类型
其中`send.type`分别对应向企业微信自定义应用发送和向企业微信API创建的群聊发送消息。
### 应用
应用指将告警结果通过企业微信的自定义应用进行通知支持向特定用户发送消息和对所有人发送消息。目前还不支持部门和标签欢迎提PR贡献代码。
下图是应用告警配置的示例:
![enterprise-wechat-app-msg-config](/img/alert/wechat-app-form-example.png)
下图是`应用``MARKDOWN`告警消息的示例:
![enterprise-wechat-app-msg-markdown](/img/alert/enterprise-wechat-app-msg-md.png)
下图是`应用``TEXT`告警消息的示例:
![enterprise-wechat-app-msg-text](/img/alert/enterprise-wechat-app-msg.png)
#### 前置
向企业微信应用发送消息之前需要在企业微信中创建自定义应用,请在[应用页面](https://work.weixin.qq.com/wework_admin/frame#apps) 进行创建,获取应用的`AgentId`并将可见范围设为根。
#### 向指定用户发消息
企业微信应用支持向特定用户发送消息和对所有人发送消息,分别为使用`|`分隔多个userId和使用`@all`向所有人发送信息。
获取用户的userId请参考[官方文档](https://developer.work.weixin.qq.com/document/path/95402)根据手机号获取userId。
下图是获取userId接口的示例:
![enterprise-wechat-create-group](/img/alert/enterprise-wechat-query-userid.png)
#### 参考文档
应用https://work.weixin.qq.com/api/doc/90000/90135/90236
群聊https://work.weixin.qq.com/api/doc/90000/90135/90248
### 群聊
user.send.msg 对应文档中的 content与此相对应的值的变量为 {msg}
群聊指将告警结果通过企业微信API创建的群聊进行通知会向该群聊下的所有人发送消息不支持向特定用户发送消息。
下图是群聊告警配置的示例:
![enterprise-wechat-group-msg-config](/img/alert/wechat-group-form-example.png)
下图是`群聊``MARKDOWN`告警消息的示例:
![enterprise-wechat-group-msg-markdown](/img/alert/enterprise-wechat-group-msg-md.png)
下图是`群聊``TEXT`告警消息的示例:
![enterprise-wechat-group-msg-text](/img/alert/enterprise-wechat-group-msg.png)
#### 前置
向企业微信群聊发送消息之前需要通过企业微信的API创建群聊请参考[官方文档](https://developer.work.weixin.qq.com/document/path/90245) 进行创建群聊并获取`chatid`。
其中获取用户的userId请参考[官方文档](https://developer.work.weixin.qq.com/document/path/95402)根据手机号获取userId。
下图是创建新聊天群组和获取userId接口的示例:
![enterprise-wechat-create-group](/img/alert/enterprise-wechat-create-group.png)
![enterprise-wechat-create-group](/img/alert/enterprise-wechat-query-userid.png)
#### 参考文档
群聊https://work.weixin.qq.com/api/doc/90000/90135/90248

View File

@ -0,0 +1,313 @@
# 概述
## 任务类型介绍
数据质量任务是用于检查数据在集成、处理过程中的数据准确性。本版本的数据质量任务包括单表检查、单表自定义SQL检查、多表准确性以及两表值比对。数据质量任务的运行环境为Spark2.4.0,其他版本尚未进行过验证,用户可自行验证。
- 数据质量任务的执行逻辑如下:
> 用户在界面定义任务,用户输入值保存在`TaskParam`中
运行任务时,`Master`会解析`TaskParam`,封装`DataQualityTask`所需要的参数下发至`Worker。
Worker`运行数据质量任务,数据质量任务在运行结束之后将统计结果写入到指定的存储引擎中,当前数据质量任务结果存储在`dolphinscheduler`的`t_ds_dq_execute_result`表中
`Worker`发送任务结果给`Master``Master`收到`TaskResponse`之后会判断任务类型是否为`DataQualityTask`,如果是的话会根据`taskInstanceId`从`t_ds_dq_execute_result`中读取相应的结果,然后根据用户配置好的检查方式,操作符和阈值进行结果判断,如果结果为失败的话,会根据用户配置好的的失败策略进行相应的操作,告警或者中断
## 注意事项
添加配置信息:`<server-name>/conf/common.properties`
```properties
data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
```
这里的`data-quality.jar.name`请根据实际打包的名称来填写,
如果单独打包`data-quality`的话,记得修改包名和`data-quality.jar.name`一致。
如果是老版本升级使用,运行之前需要先执行`sql`更新脚本进行数据库初始化。
如果要用到`MySQL`数据,需要将`pom.xml`中`MySQL`的`scope`注释掉
当前只测试了`MySQL`、`PostgreSQL`和`HIVE`数据源,其他数据源暂时未测试过
`Spark`需要配置好读取`Hive`元数据,`Spark`不是采用`jdbc`的方式读取`Hive`
## 检查逻辑详解
- 校验公式:[校验方式][操作符][阈值],如果结果为真,则表明数据不符合期望,执行失败策略
- 校验方式:
- [Expected-Actual][期望值-实际值]
- [Actual-Expected][实际值-期望值]
- [Actual/Expected][实际值/期望值]x100%
- [(Expected-Actual)/Expected][(期望值-实际值)/期望值]x100%
- 操作符:=、>、>=、<<=、!=
- 期望值类型
- 固定值
- 日均值
- 周均值
- 月均值
- 最近7天均值
- 最近30天均值
- 源表总行数
- 目标表总行数
- 例子
- 校验方式为:[Expected-Actual][期望值-实际值]
- [操作符]>
- [阈值]0
- 期望值类型:固定值=9。
假设实际值为10操作符为 >, 期望值为9那么结果 10 -9 > 0 为真,那就意味列为空的行数据已经超过阈值,任务被判定为失败
# 任务操作指南
## 单表检查之空值检查
### 检查介绍
空值检查的目标是检查出指定列为空的行数,可将为空的行数与总行数或者指定阈值进行比较,如果大于某个阈值则判定为失败
- 计算指定列为空的SQL语句如下
```sql
SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
```
- 计算表总行数的SQL如下
```sql
SELECT COUNT(*) AS total FROM ${src_table} WHERE (${src_filter})
```
### 界面操作指南
![dataquality_null_check](/img/tasks/demo/null_check.png)
- 源数据类型选择MySQL、PostgreSQL等
- 源数据源:源数据类型下对应的数据源
- 源数据表:下拉选择验证数据所在表
- 源过滤条件:如标题,统计表总行数的时候也会用到,选填
- 源表检查列:下拉选择检查列名
- 校验方式:
- [Expected-Actual][期望值-实际值]
- [Actual-Expected][实际值-期望值]
- [Actual/Expected][实际值/期望值]x100%
- [(Expected-Actual)/Expected][(期望值-实际值)/期望值]x100%
- 校验操作符:=>、>=、<<=、!=
- 阈值:公式中用于比较的值
- 失败策略
- 告警数据质量任务失败了DolphinScheduler任务结果为成功发送告警
- 阻断数据质量任务失败了DolphinScheduler任务结果为失败发送告警
- 期望值类型:在下拉菜单中选择所要的类型
## 单表检查之及时性检查
### 检查介绍
及时性检查用于检查数据是否在预期时间内处理完成,可指定开始时间、结束时间来界定时间范围,如果在该时间范围内的数据量没有达到设定的阈值,那么会判断该检查任务为失败
### 界面操作指南
![dataquality_timeliness_check](/img/tasks/demo/timeliness_check.png)
- 源数据类型选择MySQL、PostgreSQL等
- 源数据源:源数据类型下对应的数据源
- 源数据表:下拉选择验证数据所在表
- 源过滤条件:如标题,统计表总行数的时候也会用到,选填
- 源表检查列:下拉选择检查列名
- 起始时间:某个时间范围的开始时间
- 结束时间:某个时间范围的结束时间
- 时间格式:设置对应的时间格式
- 校验方式:
- [Expected-Actual][期望值-实际值]
- [Actual-Expected][实际值-期望值]
- [Actual/Expected][实际值/期望值]x100%
- [(Expected-Actual)/Expected][(期望值-实际值)/期望值]x100%
- 校验操作符:=>、>=、<<=、!=
- 阈值:公式中用于比较的值
- 失败策略
- 告警数据质量任务失败了DolphinScheduler任务结果为成功发送告警
- 阻断数据质量任务失败了DolphinScheduler任务结果为失败发送告警
- 期望值类型:在下拉菜单中选择所要的类型
## 单表检查之字段长度校验
### 检查介绍
字段长度校验的目标是检查所选字段的长度是否满足预期,如果有存在不满足要求的数据,并且行数超过阈值则会判断任务为失败
### 界面操作指南
![dataquality_length_check](/img/tasks/demo/field_length_check.png)
- 源数据类型选择MySQL、PostgreSQL等
- 源数据源:源数据类型下对应的数据源
- 源数据表:下拉选择验证数据所在表
- 源过滤条件:如标题,统计表总行数的时候也会用到,选填
- 源表检查列:下拉选择检查列名
- 逻辑操作符:=>、>=、<<=、!=
- 字段长度限制:如标题
- 校验方式:
- [Expected-Actual][期望值-实际值]
- [Actual-Expected][实际值-期望值]
- [Actual/Expected][实际值/期望值]x100%
- [(Expected-Actual)/Expected][(期望值-实际值)/期望值]x100%
- 校验操作符:=>、>=、<<=、!=
- 阈值:公式中用于比较的值
- 失败策略
- 告警数据质量任务失败了DolphinScheduler任务结果为成功发送告警
- 阻断数据质量任务失败了DolphinScheduler任务结果为失败发送告警
- 期望值类型:在下拉菜单中选择所要的类型
## 单表检查之唯一性校验
### 检查介绍
唯一性校验的目标是检查字段是否存在重复的情况一般用于检验primary key是否有重复如果存在重复且达到阈值则会判断检查任务为失败
### 界面操作指南
![dataquality_uniqueness_check](/img/tasks/demo/uniqueness_check.png)
- 源数据类型选择MySQL、PostgreSQL等
- 源数据源:源数据类型下对应的数据源
- 源数据表:下拉选择验证数据所在表
- 源过滤条件:如标题,统计表总行数的时候也会用到,选填
- 源表检查列:下拉选择检查列名
- 校验方式:
- [Expected-Actual][期望值-实际值]
- [Actual-Expected][实际值-期望值]
- [Actual/Expected][实际值/期望值]x100%
- [(Expected-Actual)/Expected][(期望值-实际值)/期望值]x100%
- 校验操作符:=>、>=、<<=、!=
- 阈值:公式中用于比较的值
- 失败策略
- 告警数据质量任务失败了DolphinScheduler任务结果为成功发送告警
- 阻断数据质量任务失败了DolphinScheduler任务结果为失败发送告警
- 期望值类型:在下拉菜单中选择所要的类型
## 单表检查之正则表达式校验
### 检查介绍
正则表达式校验的目标是检查某字段的值的格式是否符合要求,例如时间格式、邮箱格式、身份证格式等等,如果存在不符合格式的数据并超过阈值,则会判断任务为失败
### 界面操作指南
![dataquality_regex_check](/img/tasks/demo/regexp_check.png)
- 源数据类型选择MySQL、PostgreSQL等
- 源数据源:源数据类型下对应的数据源
- 源数据表:下拉选择验证数据所在表
- 源过滤条件:如标题,统计表总行数的时候也会用到,选填
- 源表检查列:下拉选择检查列名
- 正则表达式:如标题
- 校验方式:
- [Expected-Actual][期望值-实际值]
- [Actual-Expected][实际值-期望值]
- [Actual/Expected][实际值/期望值]x100%
- [(Expected-Actual)/Expected][(期望值-实际值)/期望值]x100%
- 校验操作符:=>、>=、<<=、!=
- 阈值:公式中用于比较的值
- 失败策略
- 告警数据质量任务失败了DolphinScheduler任务结果为成功发送告警
- 阻断数据质量任务失败了DolphinScheduler任务结果为失败发送告警
- 期望值类型:在下拉菜单中选择所要的类型
## 单表检查之枚举值校验
### 检查介绍
枚举值校验的目标是检查某字段的值是否在枚举值的范围内,如果存在不在枚举值范围里的数据并超过阈值,则会判断任务为失败
### 界面操作指南
![dataquality_enum_check](/img/tasks/demo/enumeration_check.png)
- 源数据类型选择MySQL、PostgreSQL等
- 源数据源:源数据类型下对应的数据源
- 源数据表:下拉选择验证数据所在表
- 源表过滤条件:如标题,统计表总行数的时候也会用到,选填
- 源表检查列:下拉选择检查列名
- 枚举值列表:用英文逗号,隔开
- 校验方式:
- [Expected-Actual][期望值-实际值]
- [Actual-Expected][实际值-期望值]
- [Actual/Expected][实际值/期望值]x100%
- [(Expected-Actual)/Expected][(期望值-实际值)/期望值]x100%
- 校验操作符:=>、>=、<<=、!=
- 阈值:公式中用于比较的值
- 失败策略
- 告警数据质量任务失败了DolphinScheduler任务结果为成功发送告警
- 阻断数据质量任务失败了DolphinScheduler任务结果为失败发送告警
- 期望值类型:在下拉菜单中选择所要的类型
## 单表检查之表行数校验
### 检查介绍
表行数校验的目标是检查表的行数是否达到预期的值,如果行数未达标,则会判断任务为失败
### 界面操作指南
![dataquality_count_check](/img/tasks/demo/table_count_check.png)
- 源数据类型选择MySQL、PostgreSQL等
- 源数据源:源数据类型下对应的数据源
- 源数据表:下拉选择验证数据所在表
- 源过滤条件:如标题,统计表总行数的时候也会用到,选填
- 源表检查列:下拉选择检查列名
- 校验方式:
- [Expected-Actual][期望值-实际值]
- [Actual-Expected][实际值-期望值]
- [Actual/Expected][实际值/期望值]x100%
- [(Expected-Actual)/Expected][(期望值-实际值)/期望值]x100%
- 校验操作符:=>、>=、<<=、!=
- 阈值:公式中用于比较的值
- 失败策略
- 告警数据质量任务失败了DolphinScheduler任务结果为成功发送告警
- 阻断数据质量任务失败了DolphinScheduler任务结果为失败发送告警
- 期望值类型:在下拉菜单中选择所要的类型
## 单表检查之自定义SQL检查
### 检查介绍
### 界面操作指南
![dataquality_custom_sql_check](/img/tasks/demo/custom_sql_check.png)
- 源数据类型选择MySQL、PostgreSQL等
- 源数据源:源数据类型下对应的数据源
- 源数据表:下拉选择要验证数据所在表
- 实际值名为统计值计算SQL中的别名如max_num
- 实际值计算SQL: 用于输出实际值的SQL、
- 注意点该SQL必须为统计SQL例如统计行数计算最大值、最小值等
- select max(a) as max_num from ${src_table},表名必须这么填
- 源过滤条件:如标题,统计表总行数的时候也会用到,选填
- 校验方式:
- 校验操作符:=>、>=、<<=、!=
- 阈值:公式中用于比较的值
- 失败策略
- 告警数据质量任务失败了DolphinScheduler任务结果为成功发送告警
- 阻断数据质量任务失败了DolphinScheduler任务结果为失败发送告警
- 期望值类型:在下拉菜单中选择所要的类型
## 多表检查之准确性检查
### 检查介绍
准确性检查是通过比较两个表之间所选字段的数据记录的准确性差异,例子如下
- 表test1
| c1 | c2 |
| :---: | :---: |
| a | 1 |
| b | 2|
- 表test2
| c21 | c22 |
| :---: | :---: |
| a | 1 |
| b | 3|
如果对比c1和c21中的数据则表test1和test2完全一致。 如果对比c2和c22则表test1和表test2中的数据则存在不一致了。
### 界面操作指南
![dataquality_multi_table_accuracy_check](/img/tasks/demo/multi_table_accuracy_check.png)
- 源数据类型选择MySQL、PostgreSQL等
- 源数据源:源数据类型下对应的数据源
- 源数据表:下拉选择要验证数据所在表
- 源过滤条件:如标题,统计表总行数的时候也会用到,选填
- 目标数据类型选择MySQL、PostgreSQL等
- 目标数据源:源数据类型下对应的数据源
- 目标数据表:下拉选择要验证数据所在表
- 目标过滤条件:如标题,统计表总行数的时候也会用到,选填
- 检查列:
- 分别填写 源数据列,操作符,目标数据列
- 校验方式:选择想要的校验方式
- 操作符:=>、>=、<<=、!=
- 失败策略
- 告警数据质量任务失败了DolphinScheduler任务结果为成功发送告警
- 阻断数据质量任务失败了DolphinScheduler任务结果为失败发送告警
- 期望值类型在下拉菜单中选择所要的类型这里只适合选择SrcTableTotalRow、TargetTableTotalRow和固定值
## 两表检查之值比对
### 检查介绍
两表值比对允许用户对两张表自定义不同的SQL统计出相应的值进行比对例如针对源表A统计出某一列的金额总值sum1针对目标表统计出某一列的金额总值sum2将sum1和sum2进行比较来判定检查结果
### 界面操作指南
![dataquality_multi_table_comparison_check](/img/tasks/demo/multi_table_comparison_check.png)
- 源数据类型选择MySQL、PostgreSQL等
- 源数据源:源数据类型下对应的数据源
- 源数据表:要验证数据所在表
- 实际值名为实际值计算SQL中的别名如max_age1
- 实际值计算SQL: 用于输出实际值的SQL、
- 注意点该SQL必须为统计SQL例如统计行数计算最大值、最小值等
- select max(age) as max_age1 from ${src_table} 表名必须这么填
- 目标数据类型选择MySQL、PostgreSQL等
- 目标数据源:源数据类型下对应的数据源
- 目标数据表:要验证数据所在表
- 期望值名为期望值计算SQL中的别名如max_age2
- 期望值计算SQL: 用于输出期望值的SQL、
- 注意点该SQL必须为统计SQL例如统计行数计算最大值、最小值等
- select max(age) as max_age2 from ${target_table} 表名必须这么填
- 校验方式:选择想要的校验方式
- 操作符:=>、>=、<<=、!=
- 失败策略
- 告警数据质量任务失败了DolphinScheduler任务结果为成功发送告警
- 阻断数据质量任务失败了DolphinScheduler任务结果为失败发送告警
## 任务结果查看
![dataquality_result](/img/tasks/demo/result.png)
## 规则查看
### 规则列表
![dataquality_rule_list](/img/tasks/demo/rule_list.png)
### 规则详情
![dataquality_rule_detail](/img/tasks/demo/rule_detail.png)

View File

@ -29,9 +29,9 @@
mkdir -p /opt
cd /opt
# 解压缩
tar -zxvf apache-dolphinscheduler-1.3.8-bin.tar.gz -C /opt
tar -zxvf apache-dolphinscheduler-<version>-bin.tar.gz -C /opt
cd /opt
mv apache-dolphinscheduler-1.3.8-bin dolphinscheduler
mv apache-dolphinscheduler-<version>-bin dolphinscheduler
```
```markdown
@ -71,7 +71,7 @@ sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
datasource.properties 中的数据库连接信息.
zookeeper.properties 中的连接zk的信息.
common.properties 中关于资源存储的配置信息(如果设置了hadoop,请检查是否存在core-site.xml和hdfs-site.xml配置文件).
env/dolphinscheduler_env.sh 中的环境变量
dolphinscheduler_env.sh 中的环境变量
````
- 根据机器配置,修改 conf/env 目录下的 `dolphinscheduler_env.sh` 环境变量(以相关用到的软件都安装在/opt/soft下为例)

View File

@ -0,0 +1,25 @@
# 通用配置
## 语言
DolphinScheduler 支持两种内置语言,包括 `English``Chinese` 。您可以点击顶部控制栏名为 `English``Chinese` 的按钮切换语言。
当您将语言从一种切换为另一种时,您所有 DolphinScheduler 的页面语言页面将发生变化。
## 主题
DolphinScheduler 支持两种类型的内置主题,包括 `Dark``Light`。当您想改变主题时,只需单击顶部控制栏在 [语言](#语言) 左侧名为 `Dark`(or `Light`)
的按钮即可。
## 时区
DolphinScheduler 支持时区设置。
服务时区
使用脚本 `bin/dolphinshceduler_daemon.sh`启动服务, 服务的默认时区为UTC 可以在`bin/env/dolphinscheduler_env.sh`中进行修改, 如`export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-Asia/Shanghai}`。<br>
IDEA 启动服务默认时区为本地时区可以加jvm参数如`-Duser.timezone=UTC`来修改时区。 时区选择详见[List of tz database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones)
用户时区
用户的默认时区基于您运行 DolphinScheduler 服务的时区。如果你想要切换时区,可以点击 [语言](#语言) 按钮右侧的时区按钮,
然后点击 `请选择时区` 进行时区选择。当切换完成后,所有与时间相关的组件都将更改。

View File

@ -6,7 +6,7 @@
## 部署步骤
集群部署(Cluster)使用的脚本和配置文件与[伪集群部署](pseudo-cluster.md)中的配置一样,所以所需要的步骤也与[伪集群部署](pseudo-cluster.md)大致一样。区别就是[伪集群部署](pseudo-cluster.md)针对的是一台机器,而集群部署(Cluster)需要针对多台机器,且两者“修改相关配置”步骤区别较大
集群部署(Cluster)使用的脚本和配置文件与[伪集群部署](pseudo-cluster.md)中的配置一样,所以所需要的步骤也与伪集群部署大致一样。区别就是伪集群部署针对的是一台机器,而集群部署(Cluster)需要针对多台机器,且两者“修改相关配置”步骤区别较大
### 前置准备工作 && 准备 DolphinScheduler 启动环境
@ -14,7 +14,7 @@
### 修改相关配置
这个是与[伪集群部署](pseudo-cluster.md)差异较大的一步,因为部署脚本会通过 `scp` 的方式将安装需要的资源传输到各个机器上,所以这一步我们仅需要修改运行`install.sh`脚本的所在机器的配置即可。配置文件在路径在`conf/config/install_config.conf`下,此处我们仅需修改**INSTALL MACHINE****DolphinScheduler ENV、Database、Registry Server**与[伪集群部署](pseudo-cluster.md)保持一致,下面对必须修改参数进行说明
这个是与[伪集群部署](pseudo-cluster.md)差异较大的一步,因为部署脚本会通过 `scp` 的方式将安装需要的资源传输到各个机器上,所以这一步我们仅需要修改运行`install.sh`脚本的所在机器的配置即可。配置文件在路径在`conf/config/install_config.conf`下,此处我们仅需修改**INSTALL MACHINE****DolphinScheduler ENV、Database、Registry Server**与伪集群部署保持一致,下面对必须修改参数进行说明
```shell
# ---------------------------------------------------------

View File

@ -13,16 +13,16 @@ Kubernetes部署目的是在Kubernetes集群中部署 DolphinScheduler 服务,
## 安装 dolphinscheduler
请下载源码包 apache-dolphinscheduler-1.3.8-src.tar.gz下载地址: [下载](/zh-cn/download/download.html)
请下载源码包 apache-dolphinscheduler-<version>-src.tar.gz下载地址: [下载](/zh-cn/download/download.html)
发布一个名为 `dolphinscheduler` 的版本(release),请执行以下命令:
```
$ tar -zxvf apache-dolphinscheduler-1.3.8-src.tar.gz
$ cd apache-dolphinscheduler-1.3.8-src/docker/kubernetes/dolphinscheduler
$ tar -zxvf apache-dolphinscheduler-<version>-src.tar.gz
$ cd apache-dolphinscheduler-<version>-src/docker/kubernetes/dolphinscheduler
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm dependency update .
$ helm install dolphinscheduler . --set image.tag=1.3.8
$ helm install dolphinscheduler . --set image.tag=<version>
```
将名为 `dolphinscheduler` 的版本(release) 发布到 `test` 的命名空间中:
@ -194,7 +194,7 @@ kubectl scale --replicas=6 sts dolphinscheduler-worker -n test # with test names
2. 创建一个新的 `Dockerfile`,用于添加 MySQL 的驱动包:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
```
@ -237,7 +237,7 @@ externalDatabase:
2. 创建一个新的 `Dockerfile`,用于添加 MySQL 驱动包:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
```
@ -266,7 +266,7 @@ docker build -t apache/dolphinscheduler:mysql-driver .
2. 创建一个新的 `Dockerfile`,用于添加 Oracle 驱动包:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY ojdbc8-19.9.0.0.jar /opt/dolphinscheduler/lib
```
@ -289,7 +289,7 @@ docker build -t apache/dolphinscheduler:oracle-driver .
1. 创建一个新的 `Dockerfile`,用于安装 pip:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY requirements.txt /tmp
RUN apt-get update && \
apt-get install -y --no-install-recommends python-pip && \
@ -322,7 +322,7 @@ docker build -t apache/dolphinscheduler:pip .
1. 创建一个新的 `Dockerfile`,用于安装 Python 3:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
RUN apt-get update && \
apt-get install -y --no-install-recommends python3 && \
rm -rf /var/lib/apt/lists/*

View File

@ -87,52 +87,58 @@ sh script/create-dolphinscheduler.sh
## 修改相关配置
完成了基础环境的准备后,在运行部署命令前,还需要根据环境修改配置文件。配置文件在路径在`conf/config/install_config.conf`下,一般部署只需要修改**INSTALL MACHINE、DolphinScheduler ENV、Database、Registry Server**部分即可完成部署,下面对必须修改参数进行说明
完成基础环境的准备后,需要根据你的机器环境修改配置文件。配置文件可以在目录 `bin/env` 中找到,他们分别是 并命名为 `install_env.sh``dolphinscheduler_env.sh`
### 修改 `install_env.sh` 文件
文件 `install_env.sh` 描述了哪些机器将被安装 DolphinScheduler 以及每台机器对应安装哪些服务。您可以在路径 `bin/env/install_env.sh` 中找到此文件,配置详情如下。
```shell
# ---------------------------------------------------------
# INSTALL MACHINE
# ---------------------------------------------------------
# 因为是在单节点上部署master、worker、API server所以服务器的IP均为机器IP或者localhost
# Due to the master, worker, and API server being deployed on a single node, the IP of the server is the machine IP or localhost
ips="localhost"
masters="localhost"
workers="localhost:default"
alertServer="localhost"
apiServers="localhost"
# DolphinScheduler安装路径,如果不存在会创建
# DolphinScheduler installation path, it will auto-create if not exists
installPath="~/dolphinscheduler"
# 部署用户,填写在 **配置用户免密及权限** 中创建的用户
# Deploy user, use the user you create in section **Configure machine SSH password-free login**
deployUser="dolphinscheduler"
```
# ---------------------------------------------------------
# DolphinScheduler ENV
# ---------------------------------------------------------
# JAVA_HOME 的路径,是在 **前置准备工作** 安装的JDK中 JAVA_HOME 所在的位置
javaHome="/your/java/home/here"
### 修改 `dolphinscheduler_env.sh` 文件
# ---------------------------------------------------------
# Database
# ---------------------------------------------------------
# 数据库的类型用户名密码IP端口元数据库db。其中dbtype目前支持 mysql 和 postgresql
dbtype="mysql"
dbhost="localhost:3306"
# 如果你不是以 dolphinscheduler/dolphinscheduler 作为用户名和密码的,需要进行修改
username="dolphinscheduler"
password="dolphinscheduler"
dbname="dolphinscheduler"
文件 `dolphinscheduler_env.sh` 描述了 DolphinScheduler 的数据库配置,一些任务类型外部依赖路径或库文件,注册中心,其中 `JAVA_HOME`
`SPARK_HOME`都是在这里定义的,其路径是 `bin/env/dolphinscheduler_env.sh`。如果您不使用某些任务类型,您可以忽略任务外部依赖项,
但您必须根据您的环境更改 `JAVA_HOME`、注册中心和数据库相关配置。
# ---------------------------------------------------------
# Registry Server
# ---------------------------------------------------------
# 注册中心地址zookeeper服务的地址
registryServers="localhost:2181"
```sh
# JAVA_HOME, will use it to start DolphinScheduler server
export JAVA_HOME=${JAVA_HOME:-/custom/path}
# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-postgresql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_DRIVER_CLASS_NAME=org.postgresql.Driver
export SPRING_DATASOURCE_URL="jdbc:postgresql://127.0.0.1:5432/dolphinscheduler"
export SPRING_DATASOURCE_USERNAME="username"
export SPRING_DATASOURCE_PASSWORD="password"
# Registry center configuration, determines the type and link of the registry center
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-localhost:2181}
```
## 初始化数据库
DolphinScheduler 元数据存储在关系型数据库中,目前支持 PostgreSQL 和 MySQL如果使用 MySQL 则需要手动下载 [mysql-connector-java 驱动][mysql] (8.0.16) 并移动到 DolphinScheduler 的 lib目录下。下面以 MySQL 为例,说明如何初始化数据库
DolphinScheduler 元数据存储在关系型数据库中,目前支持 PostgreSQL 和 MySQL如果使用 MySQL 则需要手动下载 [mysql-connector-java 驱动][mysql] (8.0.16) 并移动到 DolphinScheduler 的 lib目录下`tools/libs/`)。下面以 MySQL 为例,说明如何初始化数据库
对于mysql 5.6 / 5.7
```shell
mysql -uroot -p
@ -146,10 +152,29 @@ mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost' IDENTI
mysql> flush privileges;
```
对于mysql 8
```shell
mysql -uroot -p
mysql> CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
# 修改 {user} 和 {password} 为你希望的用户名和密码
mysql> CREATE USER '{user}'@'%' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%';
mysql> CREATE USER '{user}'@'localhost' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost';
mysql> FLUSH PRIVILEGES;
```
将`tools/conf/application.yaml`中的username和password改成你在上一步中设置的用户名{user}和密码{password}
然后修改`tools/bin/dolphinscheduler_env.sh`将mysql设置为默认数据类型`export DATABASE=${DATABASE:-mysql}`.
完成上述步骤后,您已经为 DolphinScheduler 创建一个新数据库,现在你可以通过快速的 Shell 脚本来初始化数据库
```shell
sh script/create-dolphinscheduler.sh
sh tools/bin/create-schema.sh
```
## 启动 DolphinScheduler
@ -157,7 +182,7 @@ sh script/create-dolphinscheduler.sh
使用上面创建的**部署用户**运行以下命令完成部署,部署后的运行日志将存放在 logs 文件夹内
```shell
sh install.sh
sh ./bin/install.sh
```
> **_注意:_** 第一次部署的话,可能出现 5 次`sh: bin/dolphinscheduler-daemon.sh: No such file or directory`相关信息,次为非重要信息直接忽略即可
@ -192,7 +217,13 @@ sh ./bin/dolphinscheduler-daemon.sh start alert-server
sh ./bin/dolphinscheduler-daemon.sh stop alert-server
```
> **_注意:_**:服务用途请具体参见《系统架构设计》小节
> **_注意1:_**: 每个服务在路径 `<server-name>/conf/dolphinscheduler_env.sh` 中都有 `dolphinscheduler_env.sh` 文件,这是可以为微
> 服务需求提供便利。意味着您可以基于不同的环境变量来启动各个服务,只需要在对应服务中配置 `bin/env/dolphinscheduler_env.sh` 然后通过 `<server-name>/bin/start.sh`
> 命令启动即可。但是如果您使用命令 `/bin/dolphinscheduler-daemon.sh start <server-name>` 启动服务器,它将会用文件 `bin/env/dolphinscheduler_env.sh`
> 覆盖 `<server-name>/conf/dolphinscheduler_env.sh` 然后启动服务,目的是为了减少用户修改配置的成本.
> **_注意2:_**服务用途请具体参见《系统架构设计》小节。Python gateway service 默认与 api-server 一起启动,如果您不想启动 Python gateway service
> 请通过更改 api-server 配置文件 `api-server/conf/application.yaml` 中的 `python-gateway.enabled : false` 来禁用它。
[jdk]: https://www.oracle.com/technetwork/java/javase/downloads/index.html
[zookeeper]: https://zookeeper.apache.org/releases.html

View File

@ -1,74 +0,0 @@
SkyWalking Agent 部署
=============================
dolphinscheduler-skywalking 模块为 Dolphinscheduler 项目提供了 [Skywalking](https://skywalking.apache.org/) 监控代理。
本文档介绍了如何通过此模块接入 SkyWalking 8.4+ (推荐使用8.5.0)。
# 安装
以下配置用于启用 Skywalking agent。
### 通过配置环境变量 (使用 Docker Compose 部署时)
修改 `docker/docker-swarm/config.env.sh` 文件中的 SKYWALKING 环境变量:
```
SKYWALKING_ENABLE=true
SW_AGENT_COLLECTOR_BACKEND_SERVICES=127.0.0.1:11800
SW_GRPC_LOG_SERVER_HOST=127.0.0.1
SW_GRPC_LOG_SERVER_PORT=11800
```
并且运行
```shell
$ docker-compose up -d
```
### 通过配置环境变量 (使用 Docker 部署时)
```shell
$ docker run -d --name dolphinscheduler \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
-e SKYWALKING_ENABLE="true" \
-e SW_AGENT_COLLECTOR_BACKEND_SERVICES="your.skywalking-oap-server.com:11800" \
-e SW_GRPC_LOG_SERVER_HOST="your.skywalking-log-reporter.com" \
-e SW_GRPC_LOG_SERVER_PORT="11800" \
-p 12345:12345 \
apache/dolphinscheduler:1.3.8 all
```
### 通过配置 install_config.conf (使用 DolphinScheduler install.sh 部署时)
添加以下配置到 `${workDir}/conf/config/install_config.conf`.
```properties
# skywalking config
# note: enable skywalking tracking plugin
enableSkywalking="true"
# note: configure skywalking backend service address
skywalkingServers="your.skywalking-oap-server.com:11800"
# note: configure skywalking log reporter host
skywalkingLogReporterHost="your.skywalking-log-reporter.com"
# note: configure skywalking log reporter port
skywalkingLogReporterPort="11800"
```
# 使用
### 导入图表
#### 导入图表到 Skywalking server
复制 `${dolphinscheduler.home}/ext/skywalking-agent/dashboard/dolphinscheduler.yml` 文件到 `${skywalking-oap-server.home}/config/ui-initialized-templates/` 目录下,并重启 Skywalking oap-server。
#### 查看 dolphinscheduler 图表
如果之前已经使用浏览器打开过 Skywalking则需要清空浏览器缓存。
![img1](/img/skywalking/import-dashboard-1.jpg)

View File

@ -4,19 +4,19 @@
- 服务管理主要是对系统中的各个服务的健康状况和基本信息的监控和显示
### Master 监控
### Master
- 主要是 master 的相关信息。
![master](/img/new_ui/dev/monitor/master.png)
### Worker 监控
### Worker
- 主要是 worker 的相关信息。
![worker](/img/new_ui/dev/monitor/worker.png)
### DB 监控
### Database
- 主要是 DB 的健康状况
@ -24,9 +24,17 @@
## 统计管理
### Statistics
![statistics](/img/new_ui/dev/monitor/statistics.png)
- 待执行命令数:统计 t_ds_command 表的数据
- 执行失败的命令数:统计 t_ds_error_command 表的数据
- 待运行任务数:统计 Zookeeper 中 task_queue 的数据
- 待杀死任务数:统计 Zookeeper 中 task_kill 的数据
### 审计日志
审计日志的记录提供了有关谁访问了系统,以及他或她在给定时间段内执行了哪些操作的信息,他对于维护安全都很有用。
![audit-log](/img/new_ui/dev/monitor/audit-log.jpg)

View File

@ -0,0 +1,9 @@
# 任务定义
任务定义允许您在基于任务级别而不是在工作流中操作修改任务。再此之前,我们已经有了工作流级别的任务编辑器,你可以在[工作流定义](workflow-definition.md)
单击特定的工作流,然后编辑任务的定义。当您想编辑特定的任务定义但不记得它属于哪个工作流时,这是令人沮丧的。所以我们决定在 `任务` 菜单下添加 `任务定义` 视图。
![task-definition](/img/new_ui/dev/project/task-definition.jpg)
在该视图中,您可以通过单击 `操作` 列中的相关按钮来进行创建、查询、更新、删除任务定义。最令人兴奋的是您可以通过通配符进行全部任务查询,当您只
记得任务名称但忘记它属于哪个工作流时是非常有用的。也支持通过任务名称结合使用 `任务类型``工作流程名称` 进行查询。

View File

@ -30,7 +30,7 @@
#### 1、下载源码包
请下载源码包 apache-dolphinscheduler-1.3.8-src.tar.gz下载地址: [下载](/zh-cn/download/download.html)
请下载源码包 apache-dolphinscheduler-<version>-src.tar.gz下载地址: [下载](/zh-cn/download/download.html)
#### 2、拉取镜像并启动服务
@ -39,20 +39,20 @@
> 对于 Windows Docker Desktop 用户,打开 **Windows PowerShell**
```
$ tar -zxvf apache-dolphinscheduler-1.3.8-src.tar.gz
$ cd apache-dolphinscheduler-1.3.8-src/docker/docker-swarm
$ docker pull dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
$ docker tag apache/dolphinscheduler:1.3.8 apache/dolphinscheduler:latest
$ tar -zxvf apache-dolphinscheduler-<version>-src.tar.gz
$ cd apache-dolphinscheduler-<version>-src/deploy/docker
$ docker pull dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
$ docker tag apache/dolphinscheduler:<version> apache/dolphinscheduler:latest
$ docker-compose up -d
```
> PowerShell 应该使用 `cd apache-dolphinscheduler-1.3.8-src\docker\docker-swarm`
> PowerShell 应该使用 `cd apache-dolphinscheduler-<version>-src\deploy\docker`
**PostgreSQL** (用户 `root`, 密码 `root`, 数据库 `dolphinscheduler`) 和 **ZooKeeper** 服务将会默认启动
#### 3、登录系统
访问前端页面http://localhost:12345/dolphinscheduler如果有需要请修改成对应的 IP 地址
访问前端页面:[http://localhost:12345/dolphinscheduler](http://localhost:12345/dolphinscheduler) ,如果有需要请修改成对应的 IP 地址
默认的用户是`admin`,默认的密码是`dolphinscheduler123`
@ -79,7 +79,7 @@ $ docker-compose up -d
我们已将面向用户的 DolphinScheduler 镜像上传至 docker 仓库,用户无需在本地构建镜像,直接执行以下命令从 docker 仓库 pull 镜像:
```
docker pull dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
docker pull dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
```
#### 5、运行一个 DolphinScheduler 实例
@ -90,7 +90,7 @@ $ docker run -d --name dolphinscheduler \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
-p 12345:12345 \
apache/dolphinscheduler:1.3.8 all
apache/dolphinscheduler:<version> all
```
注:数据库用户 test 和密码 test 需要替换为实际的 PostgreSQL 用户和密码192.168.x.x 需要替换为 PostgreSQL 和 ZooKeeper 的主机 IP
@ -119,7 +119,7 @@ $ docker run -d --name dolphinscheduler-master \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
apache/dolphinscheduler:1.3.8 master-server
apache/dolphinscheduler:<version> master-server
```
* 启动一个 **worker server**, 如下:
@ -129,7 +129,7 @@ $ docker run -d --name dolphinscheduler-worker \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
apache/dolphinscheduler:1.3.8 worker-server
apache/dolphinscheduler:<version> worker-server
```
* 启动一个 **api server**, 如下:
@ -140,7 +140,7 @@ $ docker run -d --name dolphinscheduler-api \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
-e ZOOKEEPER_QUORUM="192.168.x.x:2181" \
-p 12345:12345 \
apache/dolphinscheduler:1.3.8 api-server
apache/dolphinscheduler:<version> api-server
```
* 启动一个 **alert server**, 如下:
@ -149,7 +149,7 @@ apache/dolphinscheduler:1.3.8 api-server
$ docker run -d --name dolphinscheduler-alert \
-e DATABASE_HOST="192.168.x.x" -e DATABASE_PORT="5432" -e DATABASE_DATABASE="dolphinscheduler" \
-e DATABASE_USERNAME="test" -e DATABASE_PASSWORD="test" \
apache/dolphinscheduler:1.3.8 alert-server
apache/dolphinscheduler:<version> alert-server
```
**注意**: 当你运行 dolphinscheduler 中的部分服务时,你必须指定这些环境变量 `DATABASE_HOST`, `DATABASE_PORT`, `DATABASE_DATABASE`, `DATABASE_USERNAME`, `DATABASE_PASSWORD`, `ZOOKEEPER_QUORUM`
@ -313,14 +313,14 @@ C:\dolphinscheduler-src>.\docker\build\hooks\build.bat
#### 从二进制包构建 (不需要 Maven 3.3+ & JDK 1.8+)
请下载二进制包 apache-dolphinscheduler-1.3.8-bin.tar.gz下载地址: [下载](/zh-cn/download/download.html). 然后将 apache-dolphinscheduler-1.3.8-bin.tar.gz 放到 `apache-dolphinscheduler-1.3.8-src/docker/build` 目录里,在 Terminal 或 PowerShell 中执行:
请下载二进制包 apache-dolphinscheduler-<version>-bin.tar.gz下载地址: [下载](/zh-cn/download/download.html). 然后将 apache-dolphinscheduler-<version>-bin.tar.gz 放到 `apache-dolphinscheduler-<version>-src/docker/build` 目录里,在 Terminal 或 PowerShell 中执行:
```
$ cd apache-dolphinscheduler-1.3.8-src/docker/build
$ docker build --build-arg VERSION=1.3.8 -t apache/dolphinscheduler:1.3.8 .
$ cd apache-dolphinscheduler-<version>-src/docker/build
$ docker build --build-arg VERSION=<version> -t apache/dolphinscheduler:<version> .
```
> PowerShell 应该使用 `cd apache-dolphinscheduler-1.3.8-src/docker/build`
> PowerShell 应该使用 `cd apache-dolphinscheduler-<version>-src/docker/build`
#### 构建多平台架构镜像
@ -375,7 +375,7 @@ done
2. 创建一个新的 `Dockerfile`,用于添加 MySQL 的驱动包:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
```
@ -421,7 +421,7 @@ DATABASE_PARAMS=useUnicode=true&characterEncoding=UTF-8
2. 创建一个新的 `Dockerfile`,用于添加 MySQL 驱动包:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
```
@ -450,7 +450,7 @@ docker build -t apache/dolphinscheduler:mysql-driver .
2. 创建一个新的 `Dockerfile`,用于添加 Oracle 驱动包:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY ojdbc8-19.9.0.0.jar /opt/dolphinscheduler/lib
```
@ -473,7 +473,7 @@ docker build -t apache/dolphinscheduler:oracle-driver .
1. 创建一个新的 `Dockerfile`,用于安装 pip:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
COPY requirements.txt /tmp
RUN apt-get update && \
apt-get install -y --no-install-recommends python-pip && \
@ -506,7 +506,7 @@ docker build -t apache/dolphinscheduler:pip .
1. 创建一个新的 `Dockerfile`,用于安装 Python 3:
```
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:1.3.8
FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:<version>
RUN apt-get update && \
apt-get install -y --no-install-recommends python3 && \
rm -rf /var/lib/apt/lists/*

View File

@ -40,7 +40,7 @@ DataX 任务类型,用于执行 DataX 程序。对于 DataX 节点worker
### 在 DolphinScheduler 中配置 DataX 环境
若生产环境中要是使用到 DataX 任务类型,则需要先配置好所需的环境。配置文件如下:`/dolphinscheduler/conf/env/dolphinscheduler_env.sh`。
若生产环境中要是使用到 DataX 任务类型,则需要先配置好所需的环境。配置文件如下:`bin/env/dolphinscheduler_env.sh`。
![datax_task01](/img/tasks/demo/datax_task01.png)

View File

@ -46,7 +46,7 @@ Flink 任务类型,用于执行 Flink 程序。对于 Flink 节点worker
#### 在 DolphinScheduler 中配置 flink 环境
若生产环境中要是使用到 flink 任务类型,则需要先配置好所需的环境。配置文件如下:`/dolphinscheduler/conf/env/dolphinscheduler_env.sh`。
若生产环境中要是使用到 flink 任务类型,则需要先配置好所需的环境。配置文件如下:`bin/env/dolphinscheduler_env.sh`。
![flink-configure](/img/tasks/demo/flink_task01.png)

View File

@ -54,7 +54,7 @@ MapReduce(MR) 任务类型,用于执行 MapReduce 程序。对于 MapReduce
#### 在 DolphinScheduler 中配置 MapReduce 环境
若生产环境中要是使用到 MapReduce 任务类型,则需要先配置好所需的环境。配置文件如下:`/dolphinscheduler/conf/env/dolphinscheduler_env.sh`。
若生产环境中要是使用到 MapReduce 任务类型,则需要先配置好所需的环境。配置文件如下:`bin/env/dolphinscheduler_env.sh`。
![mr_configure](/img/tasks/demo/mr_task01.png)

View File

@ -46,7 +46,7 @@ Spark 任务类型,用于执行 Spark 程序。对于 Spark 节点worker
#### 在 DolphinScheduler 中配置 Spark 环境
若生产环境中要是使用到 Spark 任务类型,则需要先配置好所需的环境。配置文件如下:`/dolphinscheduler/conf/env/dolphinscheduler_env.sh`。
若生产环境中要是使用到 Spark 任务类型,则需要先配置好所需的环境。配置文件如下:`bin/env/dolphinscheduler_env.sh`。
![spark_configure](/img/tasks/demo/spark_task01.png)

View File

@ -13,24 +13,17 @@
- 以下升级操作都需要在新版本的目录进行
## 4. 数据库升级
- 修改conf/datasource.properties中的下列属性
- 将`./tools/conf/application.yaml`中的username和password改成你设定数据库用户名和密码
- 如果选择 MySQL注释掉 PostgreSQL 相关配置(反之同理), 还需要手动添加 [[ mysql-connector-java 驱动 jar ](https://downloads.MySQL.com/archives/c-j/)] 包到 lib 目录下这里下载的是mysql-connector-java-8.0.16.jar,然后正确配置数据库连接相关信息
- 如果选择 MySQL修改`./tools/bin/dolphinscheduler_env.sh`中的如下配置, 还需要手动添加 [ mysql-connector-java 驱动 jar ](https://downloads.MySQL.com/archives/c-j/) 包到 lib 目录`./tools/lib`这里下载的是mysql-connector-java-8.0.16.jar
```properties
# postgre
#spring.datasource.driver-class-name=org.postgresql.Driver
#spring.datasource.url=jdbc:postgresql://localhost:5432/dolphinscheduler
# mysql
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://xxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&allowMultiQueries=true 需要修改ip本机localhost即可
spring.datasource.username=xxx 需要修改为上面的{user}值
spring.datasource.password=xxx 需要修改为上面的{password}值
```shell
export DATABASE=${DATABASE:-mysql}
```
- 执行数据库升级脚本
`sh ./script/upgrade-dolphinscheduler.sh`
`sh ./tools/bin/upgrade-schema.sh`
## 5. 服务升级

View File

@ -3,6 +3,10 @@
# 历史版本:
#### 以下是Apache DolphinScheduler每个稳定版本的设置说明。
### Versions: 3.0.0-alpha
#### Links [3.0.0-alpha 文档](../3.0.0/user_doc/about/introduction.md)
### 版本2.0.5
#### 地址:[2.0.5 文档](../2.0.5/user_doc/guide/quick-start.md)

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 34 KiB

After

Width:  |  Height:  |  Size: 155 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 43 KiB

After

Width:  |  Height:  |  Size: 234 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 77 KiB

After

Width:  |  Height:  |  Size: 141 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

Some files were not shown because too many files have changed in this diff Show More