Yarn vs NPM? Will Yarn Really Speed Up My CI Builds?

Before the release of Yarn, NPM was the go-to package manager for Node.js. When Facebook released their own package manager solution, called Yarn in October 2016, it caught the attention of many developers. Yarn is supposed to solve some of the problems with NPM but not replace it completely. It provides a new CLI but uses NPM registry under the hood to retrieve the dependencies. Since it’s release, Yarn has received over 22,000 Github stars been named the second fastest growing open source project on GitHub ever.

So should we all make the switch to Yarn? Let’s take a look at some of the main pain points developers and teams face when using NPM and compare the available NPM and Yarn solutions. (Note- This post will not cover all the differences between these two package managers, but will explain how to address some of the known issues with NPM and Yarn.)

Very Slow Install Times

If you’re a Node.js developer that has used NPM, you’ve probably run into issues with slow install times. You start cloning a new repository and running npm install to execute the package and find yourself waiting, and waiting… Yarn solves this problem by providing an ultra-fast caching system and parallelization of operations to maximize resource utilization. This means that in contrast of NPM, Yarn actually downloads the dependencies in parallel and makes sure that everything is cached.

Package Versions Are Not Distinct

One of the biggest problems with NPM is that triggering a single npm install command will not necessary lead to a deterministic result. So if you tested something locally on your machine but your CI created a final artifact by running the npm install command, you could end up with different dependency versions running in production. This happens because a single npm install command is actually a recursive operation performed on all the dependencies of the original dependencies. So if a module required a non-distinct version number, a future npm install can lead to a different fetched version.

In order to solve this problem with NPM, Shrinkwrap was introduced. Developers can run shrinkwrap manually after running npm install, which will then create a npm-shrinkwrap.json filecontaining all the distinct versions of all dependencies recursively. This leaves the responsibility of maintaining the npm-shrinkwrap.json file on the developer, not NPM.

Yarn built-in a feature to solve this problem so that the actual responsibility of maintaining the distinct versions is on Yarn itself. When a developer runs yarn (which is the equivalent to npm install) a yarn.lock file is created or updated according to the exact versions of the dependencies. The yarn.lock file also contains the packages sha1 checksum and with this Yarn will make sure to re-pull the package in case it has been changed somehow.

Using Yarn or NPM as Part of Your Continuous Integration Flow

Whether you choose Yarn or NPM in your regular CI flow, every now and then a triggered build will have to re-install all your dependencies from scratch. Let’s take a look at the scenarios where this can occur and what you can do to solve it. (Most CI flows, whether using Docker as their facilitator or not, include one or more participating servers that are involved during the entire flow.)

Non-Docker CI Flow:

When using a single server, most CI flows will start from cloning the repository each time a flow starts. As a consequence, it will start installing the dependencies from scratch so the new CI flow will not be able to benefit from previous installations. NPM and Yarn support the ability to cache dependencies to the local file system. So if configured correctly, it solves this problem as long as there’s only a single server.

Once the number of CI flows increases, a single server will probably not be enough. When new CI servers are added, the CI flows that are running on a specific server will not be able to use the caching made on a different server. This will become a bigger problem if the servers that are used for the CI flows are constantly being killed and created on demand.

Docker CI Flow:

One of the most powerful strengths of Docker is its ability to use the previously built layers during a build process of an image. As long as a single Docker daemon is used for all CI flows and the installation of the dependencies is being done as part of image build, inserting the install command correctly into the Dockerfile should do the trick. If your CI flow also includes testing, then the dependencies will be needed for the tests. However, this might not always be possible using a built image, meaning it will require an additional installation of the dependencies.

Using Docker is not bullet proof. Once the number of Docker daemons increases, a specific Docker daemon that was chosen for the CI flow will not always be able to use a different Docker daemon’s cached layers. This will result in an installation of the dependencies from scratch. It’s worth mentioning that even if an image was pulled from a registry to a Docker daemon, it will not be able to use its layers during the caching resolution process. There are solutions for enabling multiple daemons to use cached layers from different daemons, but they are not easy to implement.
Since the release of Docker 1.13, it’s possible to pass to the build process of a specific image as its cache source using the --cache-from flag. But even with this new builds can’t always access all the previously built layers.

Using Codefresh to Cache Dependencies Between CI Flows:

Codefresh provides a full distributed solution for Docker-based CI flows. Codefresh manages everything related to resource consumption and allocation for its users and ensures that every CI flow works as fast as possible. In order to support this, Codefresh holds a dynamically large amount of Docker daemons within the platform.

As a team of engineers, we at Codefresh have all encountered these issues so we’ve to help solve them with our platform. One of the main capabilities Codefresh provides is ‘shared volumes.’ In Codefresh, a shared volume is created for every pipeline and is persisted so that it is usable in future flows, even if the future flow runs on a totally different Docker daemon. The shared volume will contain the cloned repository associated with the context of the flow and everything saved there will be persisted. This opens a new world of possibilities.

Comparing Performance using Codefresh

If you’d like to test the performance difference between NPM and Yarn, we prepared a codefresh.yaml file you can commit to your repository. Then using Codefresh, you can easily run it and see the performance difference between the two package management tools. Check out our documentation page for more information.

version: '1.0'

steps:
  install_using_npm:
    title: install dependencies using Npm
    image: node:6.9.2
    commands:
        - npm install

  install_using_yarn:
    title: install dependencies using Yarn
    image: kkarczmarczyk/node-yarn:6.9
    commands:
        - yarn