Streamline your container workflow with Codefresh!

Docker Chaos Testing

Docker Tutorial | June 6, 2017

Okay so you’ve successfully deployed your new or legacy application to a Docker container. You have also wired up dependencies and are ready for deployment to an integration environment, so now what? Building through continuous integration for your application is a great idea to ensure the success of a project, but how does your application deal with failure? If chaos engineering is an unfamiliar concept, be sure to read up on the Principles of Chaos Engineering. The idea is to ensure that weaknesses within a distributed system are uncovered and that the system behaves predictably in a controlled environment.

 

There are a few tools available specifically for Chaos Testing including Chaos Monkey, Pumba, and Gremlins. Due to the project’s focus on containers, this post discusses Pumba for Chaos Testing. In their own words, “Pumba is a resilience testing tool, that helps applications tolerate random Docker container failures: process, network and performance.”

Getting Started

To follow along, you can clone the associated git repository and initialize the project containers. It is important that you ensure that the application is functioning properly prior to executing any chaos testing.

Traffic Generation

We’ll also need a tool to generate a constant amount of traffic. For this purpose, although it may be overkill, we’ll use Vegeta. I’ve provided a Vegeta Docker image which supports two environment variables DURATION and TARGET. DURATION is the amount of time for the traffic generation to run, which can be measured in seconds or minutes (eg. 1s, 1m). TARGET is the URL of the web service under test. To begin generating traffic to the web service, you can run the following command.

Chaos Testing with Pumba

Pumba provides five commands that are used to simulate error conditions within a controlled environment.

kill

The kill command can be used to simulate containers being randomly terminated. The command allows users to provide a custom signal. If left unspecified, SIGKILL is sent to the container process.

netem

The netem command is one of the more advanced testing capabilities provided by Pumba used to emulate misbehaving networks. This command provides several capabilities, which include rate limiting, delaying, reordering, duplicating, and losing packets. This is considerably important due to the variability of networks.

pause

The pause command pauses running containers whose name matches the supplied expression. Unless you are running within an orchestration framework, such as Kubernetes or Amazon’s Container Service, this command is especially helpful. It works by periodically pausing/resuming one or more containers for a user specified duration.

stop

The stop command stops running containers whose name matches the supplied expression. First, Pumba will attempt to gracefully terminate the process running within the target container(s). If unable to stop the process, the process is terminated with a SIGKILL signal.

rm

The rm command removes running containers and attached volumes. This command is especially destructive since by default it will also remove any underlying volumes attached to the target container(s).

For more information about Pumba and its capabilities, refer to the project’s source repository.

Putting it All Together

To run through a set of tests, you will need at least two separate terminal windows open. In one terminal, run the Chaos Test. Since all tests are being executed without an orchestration framework, we want to run Pumba’s pause command.

In another window, begin the traffic generator.

You should see output similar to the following once the test has completed.

As you can see, Pumba was able to interfere with the test results. You can also see that the service is returning a 500 error when it is unable to connect to the Postgres database, which is less than ideal. Although a simplistic case, we could add caching to our service to prevent any errors from occurring and end up with a much higher success rate. Let’s see how Pumba’s network emulation commands can vary test results. In one window run an instance of the chaos-loss container.

Again, in another window, let’s run the traffic generator.

Again, you should see results similar to the following, when completed.

Hopefully, you’ve gained an appreciation for Chaos Engineering and have a better understanding of how it can help to improve the resiliency of your production applications. I encourage you to learn more about the capabilities of both Vegeta and Pumba.

About Dan Garfield

Dan is a full-stack web developer and VP of the Marketing at Codefresh. Dan is a *nix native and all around technology enthusiast.

Reader Interactions

Enjoy this article? Don't forget to share.

Follow me on Twitter