跳到主要内容
版本:v0.15

Domain Layer Schema

Summary

This document describes Apache DevLake's domain layer schema.

Referring to DevLake's architecture, the data in the domain layer is transformed from the data in the tool layer. The tool layer schema is based on the data from specific tools such as Jira, GitHub, Gitlab, Jenkins, etc. The domain layer schema can be regarded as an abstraction of tool-layer schemas.

DevLake Dataflow

Domain layer schema itself includes 2 logical layers: a DWD layer and a DWM layer. The DWD layer stores the detailed data points, while the DWM is the slight aggregation and operation of DWD to store more organized details or middle-level metrics.

Use Cases

  1. All metrics from pre-built dashboards are based on this data schema.
  2. As a user, you can create your own customized dashboards based on this data schema.
  3. As a contributor, you can refer to this data schema while working on the ETL logic when adding/updating data source plugins.

Data Models

This is the up-to-date domain layer schema for DevLake v0.10.x. Tables (entities) are categorized into 5 domains.

  1. Issue tracking domain entities: Jira issues, GitHub issues, GitLab issues, etc.
  2. Source code management domain entities: Git/GitHub/Gitlab commits and refs(tags and branches), etc.
  3. Code review domain entities: GitHub PRs, Gitlab MRs, etc.
  4. CI/CD domain entities: Jenkins jobs & builds, etc.
  5. Cross-domain entities: entities that map entities from different domains to break data isolation.

Schema Diagram

Domain Layer Schema

When reading the schema, you'll notice that many tables' primary key is called id. Unlike auto-increment id or UUID, id is a string composed of several parts to uniquely identify similar entities (e.g. repo) from different platforms (e.g. Github/Gitlab) and allow them to co-exist in a single table.

Tables that end with WIP are still under development.

Naming Conventions

  1. The name of a table is in plural form. Eg. boards, issues, etc.
  2. The name of a table which describe the relation between 2 entities is in the form of [BigEntity in singular form]_[SmallEntity in plural form]. Eg. board_issues, sprint_issues, pull_request_comments, etc.
  3. Value of the field in enum type are in capital letters. Eg. table.issues.type has 3 values, REQUIREMENT, BUG, INCIDENT. Values that are phrases, such as 'IN_PROGRESS' of table.issues.status, are separated with underscore '_'.

How to Customize Data Models

Apache DevLake provides 2 plugins:

  • customize: to create/delete columns in the domain layer schema with the data extracted from raw layer tables
  • dbt: to transform data based on the domain layer schema and generate new tables

DWD Entities - (Data Warehouse Detail)

Domain 1 - Issue Tracking

issues

An issue is the abstraction of Jira/Github/GitLab/TAPD/... issues.

fieldtypelengthdescriptionkey
idvarchar255An issue's id is composed of < plugin >:< Entity >:< PK0 >[:PK1]..."
  • For Github issues, a Github issue's id is like "github:GithubIssues:< GithubIssueId >". Eg. 'github:GithubIssues:1049355647'
  • For Jira issues, a Github repo's id is like "jira:JiraIssues:< JiraSourceId >:< JiraIssueId >". Eg. 'jira:JiraIssues:1:10063'. < JiraSourceId > is used to identify which jira source the issue came from, since DevLake users can import data from several different Jira instances at the same time.
PK
issue_keyvarchar255The key of this issue. For example, the key of this Github issue is 1145.
urlvarchar255The url of the issue. It's a web address in most cases.
titlevarchar255The title of an issue
descriptionlongtextThe detailed description/summary of an issue
typevarchar255The standard type of this issue. There're 3 standard types:
  • REQUIREMENT: this issue is a feature
  • BUG: this issue is a bug found during test
  • INCIDENT: this issue is a bug found after release
The 3 standard types are transformed from the original types of an issue. The transformation rule is set in the '.env' file or 'config-ui' before data collection. For issues with an original type that has not mapped to a standard type, the value of type will be the issue's original type.
original_typevarchar255The original type of an issue.
statusvarchar255The standard statuses of this issue. There're 3 standard statuses:
  • TODO: this issue is in backlog or to-do list
  • IN_PROGRESS: this issue is in progress
  • DONE: this issue is resolved or closed
The 3 standard statuses are transformed from the original statuses of an issue. The transformation rule:
  • For Jira issue status: transformed from the Jira issue's statusCategory. Jira issue has 3 default status categories: 'To Do', 'In Progress', 'Done'.
  • For Github issue status:
    • open -> TODO
    • closed -> DONE
original_statusvarchar255The original status of an issue.
story_pointintThe story point of this issue. It's default to an empty string for data sources such as Github issues and Gitlab issues.
priorityvarchar255The priority of the issue
componentvarchar255The component a bug-issue affects. This field only supports Github plugin for now. The value is transformed from Github issue labels by the rules set according to the user's configuration of .env by end users during DevLake installation.
severityvarchar255The severity level of a bug-issue. This field only supports Github plugin for now. The value is transformed from Github issue labels by the rules set according to the user's configuration of .env by end users during DevLake installation.
parent_issue_idvarchar255The id of its parent issue
epic_keyvarchar255The key of the epic this issue belongs to. For tools with no epic-type issues such as Github and Gitlab, this field is default to an empty string
original_estimate_minutesintThe original estimation of the time allocated for this issue
time_spent_minutesintThe original estimation of the time allocated for this issue
time_remaining_minutesintThe remaining time to resolve the issue
creator_idvarchar255The id of issue creator
creator_namevarchar255The name of the creator
assignee_idvarchar255The id of issue assignee.
  • For Github issues: this is the last assignee of an issue if the issue has multiple assignees
  • For Jira issues: this is the assignee of the issue at the time of collection
assignee_namevarchar255The name of the assignee
created_datedatetime3The time issue created
updated_datedatetime3The last time issue gets updated
resolution_datedatetime3The time the issue changes to 'DONE'.
lead_time_minutesintDescribes the cycle time from issue creation to issue resolution.
  • For issues whose type = 'REQUIREMENT' and status = 'DONE', lead_time_minutes = resolution_date - created_date. The unit is minute.
  • For issues whose type != 'REQUIREMENT' or status != 'DONE', lead_time_minutes is null
original_projectvarchar255The name of the original project. Transformed from a Jira project's name, a TAPD workspace's name, etc.

issue_labels

This table shows the labels of issues. Multiple entries can exist per issue. This table can be used to filter issues by label name.

fieldtypelengthdescriptionkey
label_namevarchar255Label name
issue_idvarchar255Issue IDFK_issues.id

issue_comments(WIP)

This table shows the comments of issues. Issues with multiple comments are shown as multiple records. This table can be used to calculate metric - issue response time.

fieldtypelengthdescriptionkey
idvarchar255The unique id of a commentPK
issue_idvarchar255Issue IDFK_issues.id
account_idvarchar255The id of the account who made the commentFK_accounts.id
bodylongtextThe body/detail of the comment
created_datedatetime3The creation date of the comment
updated_datedatetime3The last time comment gets updated

issue_changelogs

This table shows the changelogs of issues. Issues with multiple changelogs are shown as multiple records. This is transformed from Jira or TAPD changelogs.

fieldtypelengthdescriptionkey
idvarchar255The unique id of an issue changelogPK
issue_idvarchar255Issue IDFK_issues.id
author_idvarchar255The id of the user who made the changeFK_accounts.id
author_namevarchar255The id of the user who made the changeFK_accounts.id
field_idvarchar255The id of changed field
field_namevarchar255The id of changed field
original_from_valuevarchar255The original value of the changed field
original_to_valuevarchar255The new value of the changed field
from_valuevarchar255The transformed/standardized original value of the changed field
to_valuevarchar255The transformed/standardized new value of the changed field
created_datedatetime3The creation date of the changelog

issue_worklogs

This table shows the work logged under issues. Usually, an issue has multiple worklogs logged by different developers.

fieldtypelengthdescriptionkey
idvarchar255The id of the worklogPK
author_idvarchar255The id of the author who logged the workFK_accounts.id
commentlongtext255The comment made while logging the work.
time_spent_minutesintThe time logged. The unit of value is normalized to minute. Eg. 1d =) 480, 4h30m =) 270
logged_datedatetime3The time of this logging action
started_datedatetime3Start time of the worklog
issue_idvarchar255Issue IDFK_issues.id

boards

A board is an issue list or a collection of issues. It's the abstraction of a Jira board, a Jira project, a GitHub issue list or a GitLab issue list. This table can be used to filter issues by the boards they belong to.

fieldtypelengthdescriptionkey
idvarchar255A board's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
  • For a Github repo's issue list, the board id is like "< github >:< GithubRepos >:< ConnectionId >:< GithubRepoId >". Eg. "github:GithubRepo:384111310"
  • For a Jira Board, the id is like the board id is like "< jira >:< JiraSourceId >< JiraBoards >:< ConnectionId >:< JiraBoardsId >". Eg. "jira:1:JiraBoards:1:12"
PK
namevarchar255The name of the board. Note: the board name of a Github project 'apache/incubator-devlake' is 'apache/incubator-devlake', representing the default issue list.
descriptionvarchar255The description of the board.
urlvarchar255The url of the board. Eg. https://github.com/apache/incubator-devlake
created_datedatetime3Board creation time
typevarchar255Identify scrum and non-scrum board

board_issues

This table shows the relation between boards and issues. This table can be used to filter issues by board.

fieldtypelengthdescriptionkey
board_idvarchar255Board idFK_boards.id
issue_idvarchar255Issue idFK_issues.id

sprints

A sprint is the abstraction of Jira sprints, TAPD iterations and GitHub milestones. A sprint contains a list of issues.

fieldtypelengthdescriptionkey
idvarchar255A sprint's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
  • A sprint in a Github repo is a milestone, the sprint id is like "< github >:< GithubRepos >:< GithubRepoId >:< milestoneNumber >".
    Eg. The id for this sprint is "github:GithubRepo:384111310:5"
  • For a Jira Board, the id is like "< jira >:< JiraSourceId >< JiraBoards >:< JiraBoardsId >".
    Eg. "jira:1:JiraBoards:12"
PK
namevarchar255The name of sprint.
For Github projects, the sprint name is the milestone name. For instance, 'v0.10.0 - Introduce Temporal to DevLake' is the name of this sprint.
urlvarchar255The url of sprint.
statusvarchar255There're 3 statuses of a sprint:
  • CLOSED: a completed sprint
  • ACTIVE: a sprint started but not completed
  • FUTURE: a sprint that has not started
started_datedatetime3The start time of a sprint
ended_datedatetime3The planned/estimated end time of a sprint. It's usually set when planning a sprint.
completed_datedatetime3The actual time to complete a sprint.
original_board_iddatetime3The id of board where the sprint first created. This field is not null only when this entity is transformed from Jira sprints.
In Jira, sprint and board entities have 2 types of relation:
  • A sprint is created based on a specific board. In this case, board(1):(n)sprint. The original_board_id is used to show the relation.
  • A sprint can be mapped to multiple boards, a board can also show multiple sprints. In this case, board(n):(n)sprint. This relation is shown in table.board_sprints
FK_boards.id

sprint_issues

This table shows the relation between sprints and issues that have been added to sprints. This table can be used to show metrics such as 'ratio of unplanned issues', 'completion rate of sprint issues', etc

fieldtypelengthdescriptionkey
sprint_idvarchar255Sprint idFK_sprints.id
issue_idvarchar255Issue idFK_issues.id
is_removedboolIf the issue is removed from this sprint, then TRUE; else FALSE
added_datedatetime3The time this issue added to the sprint. If an issue is added to a sprint multiple times, the latest time will be the value.
removed_datedatetime3The time this issue gets removed from the sprint. If an issue is removed multiple times, the latest time will be the value.
added_stagevarchar255The stage when issue is added to this sprint. There're 3 possible values:
  • BEFORE_SPRINT
    Planning before sprint starts.
    Condition: sprint_issues.added_date <= sprints.start_date
  • DURING_SPRINT Planning during a sprint.
    Condition: sprints.start_date < sprint_issues.added_date <= sprints.end_date
  • AFTER_SPRINT
    Planing after a sprint. This is caused by improper operation - adding issues to a completed sprint.
    Condition: sprint_issues.added_date ) sprints.end_date
resolved_stagevarchar255The stage when an issue is resolved (issue status turns to 'DONE'). There're 3 possible values:
  • BEFORE_SPRINT
    Condition: issues.resolution_date <= sprints.start_date
  • DURING_SPRINT
    Condition: sprints.start_date < issues.resolution_date <= sprints.end_date
  • AFTER_SPRINT
    Condition: issues.resolution_date ) sprints.end_date

board_sprints

fieldtypelengthdescriptionkey
board_idvarchar255Board idFK_boards.id
sprint_idvarchar255Sprint idFK_sprints.id

Domain 2 - Source Code Management

repos

Information about GitHub or Gitlab repositories. A repository is always owned by a user.

fieldtypelengthdescriptionkey
idvarchar255A repo's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
For example, a Github repo's id is like "< github >:< GithubRepos >:< ConnectionId >:< GithubRepoId >". Eg. 'github:GithubRepos:1:384111310'
PK
namevarchar255The name of repo.
descriptionvarchar255The description of repo.
urlvarchar255The url of repo. Eg. https://github.com/apache/incubator-devlake
owner_idvarchar255The id of the owner of repoFK_accounts.id
languagevarchar255The major language of repo. Eg. The language for apache/incubator-devlake is 'Go'
forked_fromvarchar255Empty unless the repo is a fork in which case it contains the id of the repo the repo is forked from.
deletedtinyint2550: repo is active 1: repo has been deleted
created_datedatetime3Repo creation date
updated_datedatetime3Last full update was done for this repo

repo_languages(WIP)

Languages that are used in the repository along with byte counts for all files in those languages. This is in line with how GitHub calculates language percentages in a repository. Multiple entries can exist per repo.

The table is filled in when the repo has been first inserted on when an update round for all repos is made.

fieldtypelengthdescriptionkey
idvarchar255A repo's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
For example, a Github repo's id is like "< github >:< GithubRepos >< GithubRepoId >". Eg. 'github:GithubRepos:384111310'
PK
languagevarchar255The language of repo.
These are the languages for apache/incubator-devlake
bytesintThe byte counts for all files in those languages
created_datedatetime3The field is filled in with the latest timestamp the query for a specific repo_id was done.

repo_commits

The commits belong to the history of a repository. More than one repos can share the same commits if one is a fork of the other.

fieldtypelengthdescriptionkey
repo_idvarchar255Repo idFK_repos.id
commit_shachar40Commit shaFK_commits.sha

refs

A ref is the abstraction of a branch or tag.

fieldtypelengthdescriptionkey
idvarchar255A ref's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
For example, a Github ref is composed of "github:GithubRepos:< GithubRepoId >:< RefUrl >". Eg. The id of release v5.3.0 of PingCAP/TiDB project is 'github:GithubRepos:384111310:refs/tags/v5.3.0' A repo's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
PK
ref_namevarchar255The name of ref. Eg. 'refs/tags/v0.9.3'
repo_idvarchar255The id of repo this ref belongs toFK_repos.id
commit_shachar40The commit this ref points to at the time of collection
is_defaulttinyint1
  • 0: not the default branch
  • 1: the ref is the default branch. By the definition of Github, the default branch is the base branch for pull requests and code commits.
merge_basechar40The merge base commit of the main ref and the current ref
ref_typevarchar64There're 2 typical types:
  • BRANCH
  • TAG

commits_diffs

This table shows the commits added in a new commit compared to an old commit. This table can be used to support tag-based and deploy-based metrics.

The records of this table are computed by RefDiff plugin. The computation should be manually triggered after using GitRepoExtractor to collect commits and refs. The algorithm behind is similar to this.

fieldtypelengthdescriptionkey
new_commit_shachar40The commit new ref/deployment points to at the time of collectionPK
old_commit_shachar40The commit old ref/deployment points to at the time of collectionPK
commit_shachar40One of the added commits in the new ref compared to the old ref/deploymentPK
sorting_indexvarchar255An index for debugging, please skip it

finished_commits_diffs

This table shows the commits_diffs new_commit_sha and old_commit_sha pairs which are calculated successfully.

fieldtypelengthdescriptionkey
new_commit_shachar40The new commit new ref/deployment points to at the time of collectionPK
old_commit_shachar40The commit old ref/deployment points to at the time of collectionPK

ref_commits

fieldtypelengthdescriptionkey
new_ref_idvarchar255The new ref's id for comparisonPK
old_ref_idvarchar255The old ref's id for comparisonPK
new_commit_shachar40The commit new ref points to at the time of collection
old_commit_shachar40The commit old ref points to at the time of collection

commits

fieldtypelengthdescriptionkey
shachar40One of the added commits in the new ref compared to the old refFK_commits.sha
messagevarchar255Commit message
author_namevarchar255The value is set with command git config user.name xxxxx commit
author_emailvarchar255The value is set with command git config user.email xxxxx author
authored_datedatetime3The date when this commit was originally made
author_idvarchar255The id of commit authorFK_accounts.id
committer_namevarchar255The name of committer
committer_emailvarchar255The email of committer
committed_datedatetime3The last time the commit gets modified.
For example, when rebasing the branch where the commit is in on another branch, the committed_date changes.
committer_idvarchar255The id of committerFK_accounts.id
additionsintAdded lines of code
deletionsintDeleted lines of code
dev_eqintA metric that quantifies the amount of code contribution. The data can be retrieved from AE plugin.

commit_files

The files have been changed via commits.

fieldtypelengthdescriptionkey
idvarchar255The id is composed of "< Commit_sha >:< file_path >"FK_commits.sha
commit_shachar40Commit shaFK_commits.sha
file_pathvarchar255Path of a changed file in a commit
additionsintThe added lines of code in this file by the commit
deletionsintThe deleted lines of code in this file by the commit

components

The components of files extracted from the file paths. This can be used to analyze Git metrics by component.

fieldtypelengthdescriptionkey
repo_idvarchar255The repo idFK_repos.id
namevarchar255The name of component
path_regexvarchar255The regex to extract components from this repo's paths

commit_file_components

The relationship between commit_file and component_name.

fieldtypelengthdescriptionkey
commit_file_idvarchar255The id of commit fileFK_commit_files.id
component_namevarchar255The component name of a file

commit_parents

The parent commit(s) for each commit, as specified by Git.

fieldtypelengthdescriptionkey
commit_shachar40commit shaFK_commits.sha
parentchar40Parent commit shaFK_commits.sha

Domain 3 - Code Review

pull_requests

A pull request is the abstraction of GitHub pull request and Gitlab merge request.

fieldtypelengthdescriptionkey
idchar40A pull request's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..." Eg. For 'github:GithubPullRequests:1347'FK_commits.sha
titlevarchar255The title of pull request
descriptionlongtextThe body/description of pull request
statusvarchar255the status of pull requests. For a Github pull request, the status can either be 'open' or 'closed'.
parent_pr_idvarchar255The id of the parent PR
pull_request_keyvarchar255The key of PR. Eg, 1536 is the key of this PR
base_repo_idvarchar255The repo that will be updated.
head_reop_idvarchar255The repo containing the changes that will be added to the base. If the head repository is NULL, this means that the corresponding project had been deleted when DevLake processed the pull request.
base_refvarchar255The branch name in the base repo that will be updated
head_refvarchar255The branch name in the head repo that contains the changes that will be added to the base
author_namevarchar255The author's name of the pull request
author_idvarchar255The author's id of the pull request
urlvarchar255the web link of the pull request
typevarchar255The work-type of a pull request. For example: feature-development, bug-fix, docs, etc.
The value is transformed from Github pull request labels by configuring GITHUB_PR_TYPE in .env file during installation.
componentvarchar255The component this PR affects.
The value is transformed from Github/Gitlab pull request labels by configuring GITHUB_PR_COMPONENT in .env file during installation.
created_datedatetime3The time PR created.
merged_datedatetime3The time PR gets merged. Null when the PR is not merged.
closed_datedatetime3The time PR closed. Null when the PR is not closed.
merge_commit_shachar40the merge commit of this PR. By the definition of Github, when you click the default Merge pull request option on a pull request on Github, all commits from the feature branch are added to the base branch in a merge commit.
base_commit_shachar40The base commit of this PR.
head_commit_shachar40The head commit of this PR.

pull_request_labels

This table shows the labels of pull request. Multiple entries can exist per pull request. This table can be used to filter pull requests by label name.

fieldtypelengthdescriptionkey
namevarchar255Label name
pull_request_idvarchar255Pull request IDFK_pull_requests.id

pull_request_commits

A commit associated with a pull request

The list is additive. This means if a rebase with commit squashing takes place after the commits of a pull request have been processed, the old commits will not be deleted.

fieldtypelengthdescriptionkey
pull_request_idvarchar255Pull request idFK_pull_requests.id
commit_shachar40Commit shaFK_commits.sha

pull_request_comments

Normal comments, review bodies, reviews' inline comments of GitHub's pull requests or GitLab's merge requests.

fieldtypelengthdescriptionkey
idvarchar255Comment idPK
pull_request_idvarchar255Pull request idFK_pull_requests.id
bodylongtextThe body of the comments
account_idvarchar255The account who made the commentFK_accounts.id
created_datedatetime3Comment creation time
positionintDeprecated
typevarchar255- For normal comments: NORMAL
- For review comments, ie. diff/inline comments: DIFF
- For reviews' body (exist in GitHub but not GitLab): REVIEW
review_idvarchar255Review_id of the comment if the type is REVIEW or DIFF
statusvarchar255Status of the comment

pull_request_events(WIP)

Events of pull requests.

fieldtypelengthdescriptionkey
idvarchar255Event idPK
pull_request_idvarchar255Pull request idFK_pull_requests.id
actionvarchar255The action to be taken, some values:
  • opened: When the pull request has been opened
  • closed: When the pull request has been closed
  • merged: When Github detected that the pull request has been merged. No merges outside Github (i.e. Git based) are reported
  • reoponed: When a pull request is opened after being closed
  • syncrhonize: When new commits are added/removed to the head repository
actor_idvarchar255The account id of the event performerFK_accounts.id
created_datedatetime3Event creation time

Domain 4 - CI/CD(WIP)

cicd_scopes

Information about Jenkins Job, GitHub Action or Gitlab CI.

  • For GitHub: a GitHub repo is converted to a cicd_scope
  • For Jenkins: a GitLab project is converted to a cicd_scope
  • For GitLab: a Jenkins job is converted to a cicd_scope
fieldtypelengthdescriptionkey
idvarchar255A cicd_scope's id is composed of "< plugin >:< Entity >:< PK0 >[:PK1]..."
For example, a Github cicd_scope's id is like "< github >:< GithubRepos >:< ConnectionId >:< GithubRepoId >". Eg. 'github:GithubRepos:1:384111310'
PK
namevarchar255The name of cicd_scope.
descriptionvarchar255The description of cicd_scope.
urlvarchar255The url of cicd_scope. Eg. https://github.com/apache/incubator-devlake or https://jenkins.xxx.cn/view/PROD/job/OPS_releasev2/
created_datedatetime3cicd_scope creation date
updated_datedatetime3Date of the last data collection for this cicd_scope

cicd_pipelines

A cicd_pipeline is a series of builds that have connections or a standalone build.

fieldtypelengthdescriptionkey
idvarchar255This key is generated based on details from the original pluginPK
namevarchar255For gitlab, as there is no name for pipeline, so we use projectId, others have their own name
resultvarchar100The result of this task
statusvarchar100The status of this task
typevarchar100To indicate if this is a DEPLOYMENT
duration_secbigint unsignedhow long does this task take
started_datedatetime3when did this task start
finished_datedatetime3when did this task finish
environmentvarchar255To indicate the environment in which the task is running

cicd_pipeline_commits

fieldtypelengthdescriptionkey
pipeline_idvarchar255This key is generated based on details from the original pluginPK
commit_shavarchar255The commit that trigger this pipelinePK
branchvarchar255The branch that trigger this pipeline
repovarchar255
repo_idvarchar255The repo that this pipeline belongs to
repo_urllongtext

cicd_tasks

A cicd_task is a single job of ci/cd.

fieldtypelengthdescriptionkey
idvarchar255This key is generated based on details from the original pluginPK
namevarchar255
pipeline_idvarchar255The id of pipeline
resultvarchar100The result of this task
statusvarchar100The status of this task
typevarchar100To indicate if this is a DEPLOYMENT
duration_secbigint unsignedhow long does this task take
started_datedatetime3when did this task start
finished_datedatetime3when did this task finish
environmentvarchar255To indicate the environment in which the task is running

Project Metric Entities

project_pr_metrics

fieldtypelengthdescriptionkey
idvarchar255Id of PRPK
project_namevarchar100The project that this PR belongs toPK
first_review_idlongtextThe id of the first review on this pr
first_commit_shalongtextThe sha of the first commit
pr_coding_timebigintThe time it takes from the first commit until a PR is issued
pr_pickup_timebigintThe time it takes from when a PR is issued until the first comment is added to that PR
pr_review_timebigintThe time it takes to complete a code review of a PR before it gets merged
deployment_idlongtextThe id of cicd_task which deploy the commits of this PR
pr_deploy_timebigintThe time it takes from when a PR is merged to when it is deployed
pr_cycle_timebigintThe total time from the first commit to when the PR is deployed

project_issue_metrics

fieldtypelengthdescriptionkey
idvarchar255Id of IssuePK
project_namevarchar100The project that this Issue belongs toPK
deployment_idlongtextThe id of cicd_task which cause an incident

Cross-Domain Entities

These entities are used to map entities between different domains. They are the key players to break data isolation.

There're low-level entities such as issue_commits, users, and higher-level cross domain entities such as board_repos

issue_commits

A low-level mapping between "issue tracking" and "source code management" domain by mapping issues and commits. Issue(n): Commit(n).

The original connection between these two entities lies in either issue tracking tools like Jira or source code management tools like GitLab. You have to use tools to accomplish this.

For example, a common method to connect Jira issue and GitLab commit is a GitLab plugin Jira Integration. With this plugin, the Jira issue key in the commit message written by the committers will be parsed. Then, the plugin will add the commit urls under this jira issue. Hence, DevLake's Jira plugin can get the related commits (including repo, commit_id, url) of an issue.

fieldtypelengthdescriptionkey
issue_idvarchar255Issue idFK_issues.id
commit_shachar40Commit shaFK_commits.sha

pull_request_issues

This table shows the issues closed by pull requests. It's a medium-level mapping between "issue tracking" and "source code management" domain by mapping issues and commits. Issue(n): Commit(n).

The data is extracted from the body of pull requests conforming to certain regular expression. The regular expression can be defined in GITHUB_PR_BODY_CLOSE_PATTERN in the .env file

fieldtypelengthdescriptionkey
pull_request_idchar40Pull request idFK_pull_requests.id
issue_idvarchar255Issue idFK_issues.id
pull_request_numbervarchar255Pull request key
issue_numbervarchar255Issue key

board_repos (Deprecated)

A way to link "issue tracking" and "source code management" domain by mapping boards and repos. Board(n): Repo(n).

fieldtypelengthdescriptionkey
board_idvarchar255Board idFK_boards.id
repo_idvarchar255Repo idFK_repos.id

accounts

This table stores of user accounts across different tools such as GitHub, Jira, GitLab, etc. This table can be joined to get the metadata of all accounts. metrics, such as 'No. of Issue closed by contributor', 'No. of commits by contributor',

fieldtypelengthdescriptionkey
idvarchar255An account's id is the identifier of the account of a specific tool. It is composed of "< Plugin >:< Entity >:< PK0 >[:PK1]..."
For example, a Github account's id is composed of "< github >:< GithubAccounts >:< GithubUserId >)". Eg. 'github:GithubUsers:14050754'
PK
emailvarchar255Email of the account
full_namevarchar255Full name
user_namevarchar255Username, nickname or Github login of an account
avatar_urlvarchar255
organizationvarchar255User's organization(s)
created_datedatetime3User creation time
statusint0: default, the user is active. 1: the user is not active

users

fieldtypelengthdescriptionkey
idvarchar255id of a personPK
emailvarchar255the primary email of a person
namevarchar255name of a person

user_accounts

fieldtypelengthdescriptionkey
user_idvarchar255users.idComposite PK, FK
account_idvarchar255accounts.idComposite PK, FK

teams

fieldtypelengthdescriptionkey
idvarchar255id from the data sources, decided by DevLake usersPK
namevarchar255name of the team. Eg. team A, team B, etc.
aliasvarchar255alias or abbreviation of a team
parent_idvarchar255teams.id, default to nullFK
sorting_indexint255the field to sort team

team_users

fieldtypelengthdescriptionkey
team_idvarchar255Full name of the team. Eg. team A, team B, etc.Composite PK, FK
user_idvarchar255users.idComposite PK, FK

project

fieldtypelengthdescriptionkey
namevarchar255name for projectPK
descriptionlongtextdescription of the project
created_atdatetime3created time of project
updated_atdatetime3last updated time of project

project_metric_settings

fieldtypelengthdescriptionkey
project_namevarchar255name for projectPK
plugin_namevarchar255name for pluginPK
plugin_optionlongtextcheck if metric plugins have been enabled by the project
enabletinyint1if the metric plugins is enabled

project_mapping

fieldtypelengthdescriptionkey
project_namevarchar255name for projectPK
tablevarchar255the table name of ScopePK
row_idvarchar255the row_id in the Scope tablePK

DWM Entities - (Data Warehouse Middle)

DWM entities are the slight aggregation and operation of DWD to store more organized details or middle-level metrics.

refs_issues_diffs

This table shows the issues fixed by commits added in a new ref compared to an old one. The data is computed from table.commits_diffs, table.finished_commits_diffs, table.pull_requests, table.pull_request_commits, and table.pull_request_issues.

This table can support tag-based analysis, for instance, 'No. of bugs closed in a tag'.

fieldtypelengthdescriptionkey
new_ref_idvarchar255The new ref's id for comparisonFK_refs.id
old_ref_idvarchar255The old ref's id for comparisonFK_refs.id
new_ref_commit_shachar40The commit new ref points to at the time of collection
old_ref_commit_shachar40The commit old ref points to at the time of collection
issue_numbervarchar255Issue number
issue_idvarchar255Issue idFK_issues.id

Get Domain Layer Models in Developer Mode

When developing a new plugin, you need to refer to domain layer models, as all raw data should be transformed to domain layer data to provide standardized metrics across tools. Please use the following method to access the domain data models.

import "github.com/apache/incubator-devlake/models/domainlayer/domaininfo"

domaininfo := domaininfo.GetDomainTablesInfo()
for _, table := range domaininfo {
// do something
}

If you want to learn more about plugin models, please visit PluginImplementation