Making our database integration test suite 50% faster with a couple lines of code

Author: Dylan Kirkby

When testing Alloy, particularly when testing larger system behaviour or complex interactions between many components, keeping the system-under-test as close to production as possible increases our confidence that passing tests actually means our system is working correctly. In this blog post, we discuss how we’ve sped up our integration test framework by 50% while keeping a close-to-production system underneath.

Making our database integration test suite 50% faster with a couple lines of code

When building integration tests for our backend, a large part of keeping the tests close to production is testing using a real database, rather than mocking the results of queries. Many of the features at Alloy have complex queries at their heart, that are essential for application correctness and non-trivial to validate by inspection. These are written in what is close to raw SQL using jOOQ, and in addition to correctness, need to account for race conditions between different production instances that might be querying or modifying data simultaneously, and performance and memory considerations as they scale to millions of rows or more.

This means that running our queries on a real database in tests is one of the keys to our tests being valuable and allowing us to ship confidently. In order to do so, both locally and in CI, for each integration test we:

  • Create a dedicated, unique database in postgres for the test
  • Run all our migrations using flyway to ensure the database’s schema is up to date
  • Execute the test
  • Drop the database

The tests start initially with an empty database to avoid the performance penalty of loading and non-determinism of running against a pre-populated database dump. The databases are dropped after test runs to ensure that we don’t have data pollution between tests that could cause unpredictable test failures.

Performance Problems

Dropping, recreating, and migrating a database takes somewhere on the order of 10 seconds. This isn’t a crazy amount, but if it’s done for every single test in our codebase, this adds up extremely quickly. In April 2020, our integration test suite already took 14 minutes to run single-threaded in CI, with only 78 integration test cases. This performance penalty was actively discouraging engineers from creating new tests that used a database as it would further slow down our test suite.

To work around this, we had already introduced a special @IntegrationTest decorator that allowed us to partition our tests and only run tests that didn’t require a database – so in practice these tests only ran in CI, rather than during the iteration cycles of engineers while they were working. However, whenever someone was working on (or broke) a database test this was an extremely slow process, first needing to find out after CI failed, and then re-running this slow test locally to debug.

It was clear that this performance wouldn’t scale to the amount of tests that would be needed long-term to ensure our system’s correctness, and was worsening our development experience and efficiency to unacceptable levels.

An Alternative Approach: Transactions

The key source of the slowness was the need to prevent data pollution between tests, which caused us to tear down and recreate the database each time. If we could avoid this without allowing database pollution, this could dramatically speed up our test suite.

Luckily, postgres (and any other ACID-compliant SQL database) already provides an excellent mechanism for preventing data changes: transactions. If we wrapped each test’s run in a transaction, and rolled it back at the end of the test, we’d be guaranteed that the state of the database wouldn’t have changed, and we could re-use the database for the following test.

In the end, the following diff (plus a couple other surrounding changes) was enough to reduce our integration test times by almost half, cutting them down to 8 minutes single-threaded in CI:

Caveats

In order for this change to work, we had to make some small modifications to some of our tests. In particular, some tests were written in such a way that they implicitly assumed what ID postgres would use for rows that were inserted during the test, which worked because it was always a fresh database that would start with 1, 2, 3, etc. Now that tests shared databases, these tests had to be rewritten to query for the created row and use its ID. This was a small portion of our tests and a relatively simple change to make.

When we originally tried this, we used just a single database across all our test suites for the minimal possible time spent migrating. However, this turned out to be a mistake; running all our tests’ queries, many with hard-coded IDs or other unexpected query patterns. These caused fairly frequent flaky test failures due to random postgres deadlocks, making our test suite a pain to work with (and causing us to revert the original change for a bit and live with the slow tests).

To fix this, our current version still uses a dedicated database per test suite, and as such migrates once per test suite as well, but this cost is much lower than the per-test-case performance. In the future, we may also explore using Template Databases to further narrow down the cost to a single migration step.

There are also some test patterns that are incompatible with this approach, and would require us to re-introduce our old way of setting up the test database if we needed to do them:

  • Because postgres doesn’t support nested transactions, this approach doesn’t allow testing whether our code uses transactions correctly. Instead, starting a transaction in code is a no-op in tests.
  • Because this relies on a transaction, tests that involve multiple database connections won’t work, as each connection wouldn’t see the others’ changes.

Epilogue

Today, Alloy has roughly 10 times as many integration tests as we did when we shipped this change. Despite that, our test suite runs in the same amount of time it took when we started this. Our software has become mission-critical for more and larger customers and our emphasis on testing has grown significantly, and our integration tests are fast and convenient, allowing them to be added as-needed by engineers to get confidence their features are working.

About the Author:

Dylan Kirkby

Related resources


Article

Phantom inventory, and what makes it so scary

Phantom inventory is a multi-billion dollar problem. When a product is reported as in-stock when it's actually out, lost sales can continue for a long...

Keep reading
Article

How to incorporate out-of-stocks (OOS) in forecasting

We discuss the three top methods for incorporating out-of-stocks in demand forecasting and the relative pros and cons of each.

Keep reading
Article

Webinar recap: Uses and best practices for end-customer demand

In a Supply Chain Now webinar, Alloy's CEO defines downstream demand, why it’s important and how brands can efficiently use POS data to their advantage

Keep reading