Docker Chaos Testing

Docker Chaos Testing

4 min read

Okay so you’ve successfully deployed your new or legacy application to a Docker container. You have also wired up dependencies and are ready for deployment to an integration environment, so now what? Building through continuous integration for your application is a great idea to ensure the success of a project, but how does your application deal with failure? If chaos engineering is an unfamiliar concept, be sure to read up on the Principles of Chaos Engineering. The idea is to ensure that weaknesses within a distributed system are uncovered and that the system behaves predictably in a controlled environment.

 

There are a few tools available specifically for Chaos Testing including Chaos Monkey, Pumba, and Gremlins. Due to the project’s focus on containers, this post discusses Pumba for Chaos Testing. In their own words, “Pumba is a resilience testing tool, that helps applications tolerate random Docker container failures: process, network and performance.”

Getting Started

To follow along, you can clone the associated git repository and initialize the project containers. It is important that you ensure that the application is functioning properly prior to executing any chaos testing.

$ git clone https://github.com/n3integration/dockerize-java.git
$ cd dockerize-java && git co chaos
$ ./gradlew clean build
$ docker-compose up -d db
$ ./gradlew flywayMigrate
$ docker-compose up -d www

Traffic Generation

We’ll also need a tool to generate a constant amount of traffic. For this purpose, although it may be overkill, we’ll use Vegeta. I’ve provided a Vegeta Docker image which supports two environment variables DURATION and TARGET. DURATION is the amount of time for the traffic generation to run, which can be measured in seconds or minutes (eg. 1s, 1m). TARGET is the URL of the web service under test. To begin generating traffic to the web service, you can run the following command.

$ docker-compose up -d generator

Chaos Testing with Pumba

Pumba provides five commands that are used to simulate error conditions within a controlled environment.

kill

The kill command can be used to simulate containers being randomly terminated. The command allows users to provide a custom signal. If left unspecified, SIGKILL is sent to the container process.

netem

The netem command is one of the more advanced testing capabilities provided by Pumba used to emulate misbehaving networks. This command provides several capabilities, which include rate limiting, delaying, reordering, duplicating, and losing packets. This is considerably important due to the variability of networks.

pause

The pause command pauses running containers whose name matches the supplied expression. Unless you are running within an orchestration framework, such as Kubernetes or Amazon’s Container Service, this command is especially helpful. It works by periodically pausing/resuming one or more containers for a user specified duration.

stop

The stop command stops running containers whose name matches the supplied expression. First, Pumba will attempt to gracefully terminate the process running within the target container(s). If unable to stop the process, the process is terminated with a SIGKILL signal.

rm

The rm command removes running containers and attached volumes. This command is especially destructive since by default it will also remove any underlying volumes attached to the target container(s).

For more information about Pumba and its capabilities, refer to the project’s source repository.

Putting it All Together

To run through a set of tests, you will need at least two separate terminal windows open. In one terminal, run the Chaos Test. Since all tests are being executed without an orchestration framework, we want to run Pumba’s pause command.

$ docker-compose up chaos-pause

In another window, begin the traffic generator.

$ docker-compose up generator

You should see output similar to the following once the test has completed.

generator_1 | Requests     [total, rate]           3000, 50.02

generator_1 | Duration     [total, attack, wait]   1m25.386995253s, 59.979999887s, 25.406995366s

generator_1 | Latencies    [mean, 50, 95, 99, max] 2.990878328s, 1.025975081s, 10.16703107s, 14.752185309s, 30.01735129s

generator_1 | Bytes In     [total, mean]           103777, 34.59

generator_1 | Bytes Out    [total, mean]           0, 0.00

generator_1 | Success      [ratio]                 61.43%

generator_1 | Status Codes [code:count]            500:1146 0:11 200:1843

generator_1 | Error Set:

generator_1 | 500 Server Error

As you can see, Pumba was able to interfere with the test results. You can also see that the service is returning a 500 error when it is unable to connect to the Postgres database, which is less than ideal. Although a simplistic case, we could add caching to our service to prevent any errors from occurring and end up with a much higher success rate. Let’s see how Pumba’s network emulation commands can vary test results. In one window run an instance of the chaos-loss container.

$ docker-compose up chaos-loss

Again, in another window, let’s run the traffic generator.

$ docker-compose up generator

Again, you should see results similar to the following, when completed.

generator_1 | Requests     [total, rate]           3000, 50.02

generator_1 | Duration     [total, attack, wait]   1m1.017726406s, 59.979999917s, 1.037726489s

generator_1 | Latencies    [mean, 50, 95, 99, max] 557.375498ms, 19.780692ms, 3.008849215s, 6.938442708s, 26.792086014s

generator_1 | Bytes In     [total, mean]           114851, 38.28

generator_1 | Bytes Out    [total, mean]           0, 0.00

generator_1 | Success      [ratio]                 52.97%

generator_1 | Status Codes [code:count]            200:1589  500:1411

generator_1 | Error Set:

generator_1 | 500 Server Error

Hopefully, you’ve gained an appreciation for Chaos Engineering and have a better understanding of how it can help to improve the resiliency of your production applications. I encourage you to learn more about the capabilities of both Vegeta and Pumba.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

Build your GitOps skills and credibility today with a GitOps Certification.

Get GitOps Certified

Ready to Get Started?
  • safer deployments
  • More frequent deployments
  • resilient deployments