Blog | Apache DevLake - Open-Source Dev Data Platform for Productivity

Setting up Devlake in AWS ECS cluster using Terraform

April 3, 2025 · 10 min read

Devlake Enthusiast

This guide provides an alternative to Devlake's Kubernetes setup and will walk you through setting up DevLake in an AWS ECS (Elastic Container Service) cluster using Terraform, providing you with a scalable and maintainable infrastructure as code solution that is also cost effective.

Basic knowledge of AWS ECS, Terraform, and your network where your cluster will be deployed is necessary to follow this guide effectively.

Quick Start Guide: Setup Your First Engineering Metrics Dashboard in 5 Minutes

April 24, 2024 · 3 min read

Joshua Poddoku

Apache DevLake Committer

In the marathon of software development, teams can be overwhelmed by the complexity of tasks and the pressure to perform. The real challenge lies in understanding which efforts lead to success and which need refinement. That's the power of metrics. They transform endless streams of data into actionable insights. Apache DevLake provides the means to harness these insights, offering a metrics dashboard that aligns daily tasks with strategic goals. This guide is your first step towards a more informed and intentional approach to development. Let’s embark on this journey to make every commit, issue, and pull request an opportunity for improvement.

DevLake Playground: How to explore your data

March 15, 2024 · 3 min read

Jochum Börger

Apache DevLake Contributor

Lennart Tange

Apache DevLake Contributor

DevLake is a compelling offering. It collects and normalizes data from many of our favorite development tools and visualizes it using Grafana dashboards. Like the sleuths we are, we feel the urge to look beyond the dashboard overviews and find the golden nuggets buried deep within the data. So, we'd like to introduce the DevLake Playground, a place where you can unleash the power of Python on your data.

Compatibility of Apache DevLake with PostgreSQL

June 23, 2022 · 3 min read

ZhangLiang

Apache DevLake PPMC

Apache DevLake is a dev data platform that can collect and integrate data from different dev tools including Jira, Github, GitLab and Jenkins.

This blog will not aim at a comprehensive summary of the compatibility of database but a record of issues for future reference.

1.Different Data Types

PostgreSQL does not have a uint type

type JenkinsBuild struct {
 common.NoPKModel
 JobName           string  `gorm:"primaryKey;type:varchar(255)"`
 Duration          float64 // build time
 DisplayName       string  // "#7"
 EstimatedDuration float64
 Number            int64 `gorm:"primaryKey;type:INT(10) UNSIGNED NOT NULL"`
 Result            string
 Timestamp         int64     // start time
 StartTime         time.Time // convered by timestamp
 CommitSha         string
}

In JenkinsBuild.Number, thegormstruct tag used UNSIGNED, which will lead to the failure to create table and should be removed.

MySQL does not have a bool data type

For a field defined as bool type in model, gorm will map it to MySQL's TINYINT data type, which can be queried directly with 0 or 1 in SQL, but PostgreSQL has a bool type, so gorm will map it to the BOOL type. If 0 or 1 is still used in SQL to query, there will be a report of error.

Here is an example(only relevant fields are shown in the example). The lookup statement works in MySQL, but will lead to an error in PostgreSQL.

type GitlabMergeRequestNote struct {
 MergeRequestId  int    `gorm:"index"`
 System          bool 
}
 
db.Where("merge_request_id = ? AND `system` = 0", gitlabMr.GitlabId).

After changing the sentence as it follows, an error will still be reported. The reason will be shown in the part about backticks.

db.Where("merge_request_id = ? AND `system` = ?", gitlabMr.GitlabId, false)

2.Different Behaviors

Bulk insertion

When ON CONFLIT UPDATE ALL was used to achieve bulk insertion, and if there are multiple records with the same primary key, it will report errors in PostgreSQL but not in MySQL.

Inconsistent definition of model with schema

For example, in the model definition, GithubPullRequest.AuthorId is of the int type, but this field in the database is of VARCHAR type. When inserting data, MySQL will accept it, but ProstgresSQL will report an error.

type GithubPullRequest struct {
 GithubId        int    `gorm:"primaryKey"`
 RepoId          int    `gorm:"index"`
 Number          int    `gorm:"index"` 
 State           string `gorm:"type:varchar(255)"`
 Title           string `gorm:"type:varchar(255)"`
 GithubCreatedAt time.Time
 GithubUpdatedAt time.Time `gorm:"index"`
 ClosedAt        *time.Time
 // In order to get the following fields, we need to collect PRs individually from GitHub
 Additions      int
 Deletions      int
 Comments       int
 Commits        int
 ReviewComments int
 Merged         bool
 MergedAt       *time.Time
 Body           string
 Type           string `gorm:"type:varchar(255)"`
 Component      string `gorm:"type:varchar(255)"`
 MergeCommitSha string `gorm:"type:varchar(40)"`
 HeadRef        string `gorm:"type:varchar(255)"`
 BaseRef        string `gorm:"type:varchar(255)"`
 BaseCommitSha  string `gorm:"type:varchar(255)"`
 HeadCommitSha  string `gorm:"type:varchar(255)"`
 Url            string `gorm:"type:varchar(255)"`
 AuthorName     string `gorm:"type:varchar(100)"`
 AuthorId       int
 common.NoPKModel
}

3.MySQL-Specific Functions

We used the GROUP_CONCATfunction in a complex query. Although there are similar functions in PostgreSQL, the function names are different and the usage is slightly different.

cursor2, err := db.Table("pull_requests pr1").
  Joins("left join pull_requests pr2 on pr1.parent_pr_id = pr2.id").Group("pr1.parent_pr_id, pr2.created_date").Where("pr1.parent_pr_id != ''").
  Joins("left join repos on pr2.base_repo_id = repos.id").
  Order("pr2.created_date ASC").
  Select(`pr2.key as parent_pr_key, pr1.parent_pr_id as parent_pr_id, GROUP_CONCAT(pr1.base_ref order by pr1.base_ref ASC) as cherrypick_base_branches, 
   GROUP_CONCAT(pr1.key order by pr1.base_ref ASC) as cherrypick_pr_keys, repos.name as repo_name, 
   concat(repos.url, '/pull/', pr2.key) as parent_pr_url`).Rows()

Solution: We finally decided to use two steps to achieve the GROUP_CONCAT function. First we used the simplest SQL query to get multiple pieces of the sorted data, and then used the code to group them.

After modification:

    cursor2, err := db.Raw(
  `
   SELECT pr2.pull_request_key                 AS parent_pr_key,
          pr1.parent_pr_id                     AS parent_pr_id,
          pr1.base_ref                         AS cherrypick_base_branch,
          pr1.pull_request_key                 AS cherrypick_pr_key,
          repos.NAME                           AS repo_name,
          Concat(repos.url, '/pull/', pr2.pull_request_key) AS parent_pr_url,
        pr2.created_date
   FROM   pull_requests pr1
          LEFT JOIN pull_requests pr2
                 ON pr1.parent_pr_id = pr2.id
          LEFT JOIN repos
                 ON pr2.base_repo_id = repos.id
   WHERE  pr1.parent_pr_id != ''
   ORDER  BY pr1.parent_pr_id,
             pr2.created_date,
       pr1.base_ref ASC
   `).Rows()

4.Different Grammar

Backticks

We used backticks in some SQL statements to protect field names from conflicting with MySQL reserved words, which can lead to errors in PostgreSQL. To solve this problem we revisited our code, modified all field names that conflict with reserved words, and removed the backticks in the SQL statement. In the example just mentioned:

db.Where("merge_request_id = ? AND `system` = ?", gitlabMr.GitlabId, false)

Solution: We changed system to is_system to avoid the usage of backticks.

db.Where("merge_request_id = ? AND is_system = ?", gitlabMr.GitlabId, false)

Non-standard delete statement

There were delete statements as followed in our code, which are legal in MySQL but will report an error in PostgreSQL.

err := db.Exec(`
 DELETE ic
 FROM jira_issue_commits ic
 LEFT JOIN jira_board_issues bi ON (bi.source_id = ic.source_id AND bi.issue_id = ic.issue_id)
 WHERE ic.source_id = ? AND bi.board_id = ?
 `, sourceId, boardId).Error

How DevLake is Up and Running

June 17, 2022 · 4 min read

Warren Chen

Apache DevLake PPMC

Apache DevLake is an integration tool with the DevOps data collection functionality, which presents a different stage of data to development teams via Grafana. which also can leverage teams to improve the development process with a data-driven model.

Apache DevLack Architecture Overview

The left side of the following screenshot is an integrative DevOps data plugin, the existing plugins include Github, GitLab, JIRA, Jenkins, Tapd, Feishu, and the most featured analysis engine in the Simayi platform.
The main framework in the middle of the following screenshot, completes data collection, expansion, and conversion to the domain layer by running subtasks in the plugins. The user can trigger the tasks by config-UI or all API.
RMDBS currently supports Mysql and PostgresSQL, more databases will be supported in the future.
Grafana can generate different types of needed data by using SQL.

Generated

Then let’s move on to how to start running DevLake.

Apache Incubator Welcomes DevLake, A Dev-Data Platform Serving Developers

May 18, 2022 · 2 min read

Maxim Wheatley

Apache DevLake PPMC

We are excited to share today that the Apache Software Foundation (ASF) voted to make DevLake an officially supported project of the Apache Incubator.

What is DevLake?

Launched in December of 2021, Apache DevLake is an open-source dev data platform that ingests, analyzes, and visualizes the fragmented data in developer tools.

Software development is complex, requiring many tools and processes, and as a result creates a storm of data scattered across tools in many formats. This makes it difficult to organize, query, and make sense of. We built Apache DevLake, to make it easy to make sense of this rich data and to translate it into actionable insights.

1.Different Data Types​

PostgreSQL does not have a uint type​

MySQL does not have a bool data type​

2.Different Behaviors​

Bulk insertion​

Inconsistent definition of model with schema​

3.MySQL-Specific Functions​

4.Different Grammar​

Backticks​

Non-standard delete statement​

Apache DevLack Architecture Overview​

What is DevLake?​

1.Different Data Types

PostgreSQL does not have a uint type

MySQL does not have a bool data type

2.Different Behaviors

Bulk insertion

Inconsistent definition of model with schema

3.MySQL-Specific Functions

4.Different Grammar

Backticks

Non-standard delete statement

Apache DevLack Architecture Overview

What is DevLake?