Schedule a FREE onboarding and start making pipelines fast.

Using Codefresh with monorepos

Continuous Integration | July 12, 2018

Today Codefresh has released a completely redesigned Git trigger dialog with several new options for controlling the cases that trigger a pipeline.

New options include:

  • The ability to select specific pull request events. If you always wanted a pipeline that only runs when a pull request is opened, you can easily do this in the GUI now (available only for GitHub repositories)
  • The ability to trigger only if the target of a pull request follows a specific naming pattern. Common examples here would be master or production so that this pipeline is triggered only when a team member is trying to merge back to the respective branch (available for all Git providers apart from Atlassian Stash)
  • The ability to trigger a pipeline only if the changed files match a specific naming pattern (available for GitHub and GitLab repositories)

The last feature is especially interesting because this means that you can now control exactly which pipelines trigger according to which files are affected by each commit. This is a game-changing feature with several implications, the biggest one being the easy management of monorepos in Codefresh.

At the moment this feature is only available to Codefresh projects that are using Github as a Git provider (as Github provides the required information in the webhook). We will add support for the other Git providers very soon.

Microservices, monorepos, multi-repos and atomic commits

Traditionally, applications developed using the monolith approach were also placed in a single Git repository. This approach was a natural fit given the big size of the project. A developer could simply check out the whole project at once, make any changes on any module and also deploy the whole application locally.
This approach was very convenient for both humans and systems as it was very simple to implement and presented a holistic view of the whole application. By checking out a single Git repository you had full access to all aspects of the application.

For really big projects, however, using a single git repository was sometimes problematic given the size of the code. Checking out code could be really slow, teams that were working on different modules would need to keep track of one another’s pull request and builds and in general, scalability of operations suffered. Still, because the application was always deployed, a single entity using a single GIT repository was the most obvious answer.

Lately, companies started moving their applications into the cloud and the introduction of containers in the form of Docker have completely changed the way an application could be deployed. Instead of deploying everything as a single entity, individual components deployed on their own (microservices) allowed for easy upgrades and most importantly easy scaling of the application.

With the appearance of microservices, developers had to face the same question of code organization. Again, the most natural choice was to split the different applications to several Git repositories. A team could now own a specific repository where:

  • Commits,
  • Pull requests,
  • Deployments,
  • and auto-scaling decisions

would happen in an independent manner.

Having different Git repositories for each microservice works well for several companies. There are some cases, however, where having completely different multiple repositories is cumbersome for the following reasons:

  • It is hard to get a good overview of the system as a whole
  • It is difficult to know which version of which microservice is dependent on everything else
  • It leads to excessive code duplication
  • It makes “atomic commits” (commits that change the API in multiple modules) a nightmare

This realization is especially true for companies that are now experimenting with serverless architectures. Having multiple functions reside in the same repository is much more straightforward even when in practice each function is deployed on its own.

For specific projects, it therefore makes sense to have a mon-repo per service. Each module/function is still deployed in an individual manner (and thus all scalability benefits at runtime are still present), but since all of them exist in the same repository, it is very simple for a developer to check out the whole service at once.

This hybrid approach, where a single Git repository holds multiple modules/functions, is unofficially called ‘monorepo’.

Using monorepos with a traditional CI solution is very challenging. Excess builds are happening all the time because multiple people are working on the same repository. Pull requests are becoming stale as they were created against a revision which quickly becomes obsolete.

Limiting triggering of builds to specific folders

To truly gain the benefits of a monorepo the CI system should be able to trigger pipelines only when changes happen in specific folders. This way the individual projects in a monorepo will build when their files change.

Codefresh offers this capability today. In the trigger of each pipeline, you can define a glob expression that will map to the project files. Only when matched files change will the pipeline trigger.

Here are some examples of glob expressions:

This means that you can now define pipelines within Codefresh that only compile/run/deploy each individual microservice even though all of them exist in the same Git repository.

This technique not only cuts down the number of builds happening during day-to-day development but also opens several other possibilities on the granularity of your Codefresh builds.

Other scenarios for limiting builds to specific changed files

Glob expressions also allow you to define files instead of just folders. This capability allows for running build-only when a specific file changes. Some examples would be:

  • Only trigger a build if a specific Dockerfile changes
  • Only trigger a build if a specific package.json/pom.xml/Gemfile changes

The first example is particularly interesting because it allows an organization to keep a big repo with “blessed” Dockerfiles and only build them when they actually change.

How will you use the modified files field?

You can read the full documentation here: https://codefresh.io/docs/docs/configure-ci-cd-pipeline/triggers/git-triggers/#using-the-modified-files-field-to-constrain-triggers-to-specific-folderfiles

New to Codefresh? Create Your Free Account Today!

Kostis Kapelonis

About Kostis Kapelonis

Kostis is a software engineer/technical-writer dual class character. He lives and breathes automation, good testing practices and stress-free deployments.

Reader Interactions

Enjoy this article? Don't forget to share.

Comments

Your email address will not be published. Required fields are marked *

Follow me on Twitter