Is your test code too brittle? It might be because of overdoing DRY.

2024-05-14 10:41:19

There are many motivations for writing automated test cases, and the benefits they bring should not be underestimated. Through them, we gain more confidence in the correctness of our code, allowing us to refactor with greater assurance while also enjoying more timely feedback. As a staunch advocate of Test-Driven Development (TDD), I firmly believe that

There are many motivations for writing automated test cases, and the benefits they bring should not be underestimated. Through them, we gain more confidence in the correctness of our code, allowing us to refactor with greater assurance while also enjoying more timely feedback. As a staunch advocate of Test-Driven Development (TDD), I firmly believe that TDD not only embodies the aforementioned advantages but also provides a shorter feedback loop and broader test coverage.

In the many critical design principles of software development, there is one called DRYDon’t Repeat Yourself. Its core significance lies in avoiding repetition. When we apply the DRY principle to our test code, it may make our tests more brittle, harder to understand, maintain, and modify. The rising cost of maintenance often makes us reconsider the value of writing tests. So, under what circumstances do our tests suffer adversely from the overzealously pursuit of DRY? And how can we avoid these issues while still benefiting from writing tests? In this article, I intend to delve deeper into this topic, explore the various signs that a test suite is becoming brittle, and discuss how to guide testing while minimizing repetition, as well as better methods for implementing DRY tests.

The Essence of DRY: DRY, short for Don’t Repeat Yourself, was originally introduced by Andy Hunt and Dave Thomas in their book “The Pragmatic Programmer”. Its definition states that “every piece of knowledge should have a single, unambiguous, authoritative representation within a system.” The advantage of DRY code is that if a concept in the application changes, we only need to modify one place. This not only makes the codebase easier to browse and maintain but also reduces the risk of errors. When a domain concept is expressed in a single way in an application, the design exhibits beauty and clarity.

However, implementing the DRY principle is not easy. Code duplication can sometimes lead us to create unnecessary levels of abstraction, making the design complex rather than clearer. The DRY principle aims to reduce conceptual repetition in code, rather than just avoiding typing the same code literally. This way of thinking is quite useful for better implementing the principle and avoiding many common pitfalls. For example, we often encounter literals in code, but are the numbers 60 that appear in different places truly instances of repetition? Do they express their own unique meaning in their different contexts? A useful question to ask oneself is: “If I need to change this value, do I want all the 60s to change along with it?” In one place, 60 might represent the number of seconds in a minute; in another, it might signify a speed limit.

Simply defining an integer as a global shared variable to follow the DRY principle (Don’t Repeat Yourself) is not a wise choice. Let’s consider this scenario: you have a method responsible for iterating through a collection and performing certain operations. This method, on the surface, seems to resemble another method that iterates through the same collection but performs different operations. At this point, you might wonder whether you should extract both to remove the duplicated code between them. The answer is not always affirmative or negative.

When considering the architecture of the code, if it is apparent that modifications to these two functions in the future often require changing them simultaneously, it indicates a close connection between them, and thus they should be combined. The decision to implement the DRY principle should not be based solely on the superficial phenomenon of code repetition but should delve deeper into the conceptual level of code duplication, which helps us prevent making incorrect decisions.

The application of the DRY principle when writing tests also presents challenges. Even though code repetition might make tests seem verbose and hard to maintain, the incorrect application of DRY can render the test suite fragile. Should test code include more repetition than application code? A common solution offered is employing DAMP— “Descriptive And Meaningful Phrases” or “Don’t Abstract Methods Prematurely,” which guides the writing of test code.

Furthermore, we have a similar acronym, WET, which encompasses “Write Everything Twice,” “Write Every Time,” “We Enjoy Typing,” or “Waste Everyone’s Time,” all of which literally echo the well-intentioned precepts of DAMP—that code should be descriptive and meaningful. Yet on a broader level, WET is the antithesis of DAMP and DRY. WET’s primary message is to value repetition in code more in testing than in application code.

However, whether in application code or test code, readability and maintainability are always issues. Conceptual duplication presents maintenance problems not only in application code but also in test code.

Let’s further illustrate this with an example of fragile test code written in Kotlin. A typical pattern could vary depending on the testing language and framework used. For instance, with RSpec, a setUp() method might include many let! statements.

In this FilterTest example, it demonstrates how to set up objects, followed by using these objects to perform filter tests under multiple scenarios. In these tests, each test case must understand and relate to conditions previously set up in the setUp method to ensure the logical and results analysis of the test case is correct.

Examining the setUp method alone may not provide sufficient information, likewise, observing only the individual test cases makes it difficult to gain a comprehensive understanding. This usually indicates the fragility of the test suite. Ideally, we want each test case to be seen as an enclosed microcosm that self-sufficiently defines all the context it requires.

For instance, from the earlier text, the setUp() method creates all the books and relevant data for a series of test cases, making it unclear which specific book data is needed for a particular test. Additionally, due to an abundance of details, it is hard to discern which are crucial for creating books and which are redundant. Of particular concern is that changes in the data used to create books will affect many elements.

For test cases, the ideal state is that each individual test calls application code to the minimal extent and makes result assertions. In these instances, the details of the specific book instances used for assertions hidden in setUp() become especially crucial. For example, it is not clear what role onlyFindsBooks plays in the test. To increase clarity, you may need to add comments in the code to explain the relevance and importance of each book attribute in every test.

Clearly, the original intent of the developer to create all objects in one place was for the sake of simplicity. If the functionality was initially limited to a few filters, then creating objects at the top might indeed simplify things. However, as the number of filters and objects grew, this approach became inadequate. With the increasing need to add more fields to books and the expectation for the results to meet specific conditions, this test structure became increasingly cumbersome and impractical.

Imagine how complex it becomes to understand which objects should be returned when we start combining different filters! To comprehend the specific behavior of onlyFindsBooks(), you need to read more code to discover the assertion logic it hides. You need to spend time digging deeper to find the connection between inputs and assertions.

More challenging is that the declaration of filter instances is often far removed from the test cases, for example, consider a test case filtering by language:

@Test
fun `filter by language`() {
    filter = Filter(language = "EN")
    onlyFindsBooks(filter, book1, book3)
}

What causes book1 and book3 to meet the condition of language = "EN", and why does this call not return book2? To answer these questions, you have to go back to the setUp section, load the entire context into your mind, and try to find the commonalities and differences between all the books.

Even more tricky is the following test:

@Test
fun `filter by last`() {
    filter = Filter(searchTerm = "title", last = "5 days")
    onlyFindsBooks(filter, book3)
}

What exactly is the basis for the “5 days” condition? Is it related to a hidden value set for book3 within the createBook() method? The developer who originally wrote this code, while applying the DRY (Don’t Repeat Yourself) principle to reduce duplication, inadvertently wove a complex and error-prone web of tests.

In software testing, we often come across misapplied DRY principle, leading to code repetition and fragile tests. The following points should be noted as signs that the tests need to be refactored in a timely manner:

  • Lack of test case isolation: Does this indicate that you have to scroll through the code frequently to understand each test case?
  • Relevant details are not highlighted: Is it necessary to rely on comments to clarify key information in the tests?
  • Ambiguous testing intent: Is there a large amount of boilerplate code or “noise” unrelated to the tests that obscures the real intent of the tests?
  • Widespread impact of code modifications: When application code changes, does it affect multiple test cases?
  • Test cases are not independent: Have you found that modifying one test case affects the outcomes of others?

To address the above issues, we can adopt two improvement strategies: the “3A” principle and using object creation methods.

“3A” Principles

Effective testing can be divided into three main parts, often referred to as “3A”: Arrange (Setup), Act (Action), Assert (Assertion). They correspond to Given (preconditions), When (action occurs), and Then (expected outcome) respectively. Best practice is to represent each “A” with one line for one step, although this is not always feasible in practice, it is an ideal state. Tests that follow the “3A” pattern tend to be easier to understand. For example:


// Arrange
var object = createObject()
// Act
var result = sut.findObject()
// Assert
assertEquals(object, result)

Object Creation Methods

Strategic use of object creation methods can reduce redundancy and bring relevant details to the forefront while hiding unrelated boilerplate code. This strategy is inspired by design patterns like the Builder Pattern and Object Mother. An excellent object creation method should include:

  • Name using domain names that reflect the type of object being created.
  • All necessary default values should be set.
  • Allow modification and overriding of values that are directly used in tests.

Applying these principles to one of our previous examples can make the code much more concise and clear:


@Test
fun `filter by language`() {
  var englishBook = createBook()
  createBookLanguage("EN", englishBook)
  var germanBook = createBook()
  createBookLanguage("DE", germanBook)
  var results = Filter(language = "EN").results()
  val expectedUuids = listOf(englishBook).map { it.uuid }
  val actualUuids = results.map { it.uuid }
  assertEquals(expectedUuids, actualUuids)
}

Through such transformation, the createBook() method has been optimized, hiding irrelevant boilerplate code, and allowing specific parameters to be overridden when necessary (such as the definition of createBook() not shown in the example).

In the process of making subtle yet effective refactoring, we carefully renamed variables to display the specific differences between them. At the same time, we inlined the filter variable to enhance its visibility in the Act step. By setting it as a constant rather than a variable, we have further reduced the volatility of the code.

We also took the initiative to inline the onlyFindsBooks() method and rename the temporary variables it contains, which effectively separated the Act step and the Assert step, enhancing the recognizability between steps. Such an arrangement makes it easy for the code reader to identify the reasons for creating two books and their differences. For the Act step, it now clearly filters out books marked as “EN” and expects the results to only contain English books.

The Arrange step specifically uses four lines of code, slightly more than the ideal situation. But even with four lines, each line is closely related to the test and clearly presents its purpose. We could consider combining the steps of creating books and setting language, but this might complicate the test code and closely couple the construction of books with language settings, potentially leading to confusion rather than clarity. However, if the concept of “books written in a specific language” already exists within the domain, combining these steps may be an appropriate choice.

The Assert step has room for optimization, as the current logic contains some noise, which increases the difficulty of understanding the reasons for test failures. To address this issue, we made the following changes:

    
      @Test
      fun `filter by language`() {
        val englishBook = createBookWrittenIn("EN")
        val germanBook = createBookWrittenIn("DE")
        val results = Filter(language = "EN").results()
        assertBooksEqual(listOf(englishBook), results)
      }
      
      private fun createBookWrittenIn(language: String): Book {
        val book = createBook()
        createBookLanguage(language, book)
        return book
      }
      
      private fun assertBooksEqual(expected: List<Book>, actual: List<Book>) {
        val expectedUuids = expected.map { it.uuid }
        val actualUuids = actual.map { it.uuid }
        assertEquals(expectedUuids, actualUuids)
      }
    
  

It is worth noting that in this test, none of the content from the setUp() method was used, which makes the context of the test clearer and more understandable.

For software testing, writing readable and effective test cases is crucial. During code testing, we can resort to some auxiliary methods to streamline and optimize the test scripts, such as createBookWrittenIn and assertBooksEqual. Even without a deep understanding of these auxiliary methods, the tests should remain clear and understandable. When applying these improvements throughout the test suite, we need to be mindful of the specific book objects and their attributes that each particular test case relies on.

As we delve deeper, we will clearly see the importance of the related details. When observing all tests together, we might feel uneasy about the creation of many book objects. However, this repetition may be necessary at the code level and does not constitute duplication conceptually. Each creation is to represent a different concept, such as an English book and a book released on a specific date.

Our tests have a series of evident advantages:

  • The setUp (test preparation) method is blank, thereby ensuring the independence and neatness of the test cases.
  • Each test case is self-sufficient, and even when changes occur in the application code (for example, the book’s constructor changes), usually only one place needs to be modified.
  • Adjusting the setUp or expected values for a specific test will not cause all test cases to fail.
  • The abstracted auxiliary method names follow the Arrange-Act-Assert (3A) pattern and are also meaningful.

Below summarizes the main principles and other guidelines we follow:

  • Test cases clearly adhere to the 3A pattern: Arrange, Act, and Assert.
  • The Arrange part of a test should not contain assertions.
  • Each test clearly demonstrates its difference from other test cases.
  • Setup methods should not include test-related differences but rather cater to each test’s local environment.
  • Extract template code to reduce “noise” and facilitate reuse.
  • Each test runs in its independent little universe, with all the context they require.
  • Avoid randomness that leads to test uncertainty; test failures should be definitive and avoid unstable intermittent failures.
  • The System Under Test (SUT) and the test subject (target behavior) should be easy to identify.
  • Assertions should use literal values, unless well-named variables can provide additional clarity.
  • Tests do not include complex logic or loops, which may lead to interdependencies between test cases, and complex logic is itself fragile and hard to understand.
  • Assertions should correspond directly to the implementation code, with as few assertions as possible in each test.
  • A test with many assertions can be split into multiple tests with fewer assertions to provide more information in case of failure.
  • Choose assertions that provide more information upon failure, such as comparing the outcomes of arrays is more informative than validating every single element.
  • If a test stops at the first point of failure, the subsequent assertions cannot provide feedback.

Sometimes, practicing these testing principles can be challenging, as they may provide design feedback when you modify the application. The following points can help you identify which test cases provide feedback on the design of the application code:

  • If the setUp code is too cumbersome, it may indicate that the content of the test exceeds the necessary scope, or that the responsibilities of the application are divided too complexly.

During testing, if we find ourselves repeatedly testing the same literals, it may mean that the application has taken on too many responsibilities. To solve this problem, we should consider applying the Single Responsibility Principle to streamline the responsibilities of the application code. In the process of ensuring that tests are easy to understand, if we find that comments must be added, then we should consider renaming variables, methods, or test names to make them more descriptive. Furthermore, we should also contemplate refactoring the application code, giving more meaningful names or separating responsibilities.

Before successfully deleting duplicate code, we should patiently wait until we fully understand the information provided by the test cases. Even if code duplication seems to affect the quality of the code, sometimes it is a more prudent choice to keep such duplication before figuring out the role of the test cases. If attempts to extract code or refactoring are met with frustration, it is recommended to first adopt inlining code as a way to revert and then try other strategies.

About performance, some developers worry that duplicate code will lead to performance issues, which often prompt them to perform code abstraction. Although slow tests are indeed a concern, performance worries about creating duplicate objects are often exaggerated, especially when compared to the time consumed in maintaining fragile tests. Therefore, a better practice is to refactor the application code to reduce the burden brought by numerous setUp operations, which can lead to better designs and make tests more lightweight. When encountering performance issues, identifying the cause of the problem first is key, which may reveal important information about the architecture, and help find a solution that does not sacrifice the clarity of tests and also improves performance.

Conclusion: The DRY (Don’t Repeat Yourself) principle is applicable to both the application code and test code. When applied in test code, we must clearly distinguish the Arrange, Act, and Assert three steps. This helps to highlight the characteristics of each test, preventing boilerplate code from interfering with the clarity of the tests. If test code becomes fragile or difficult to read due to changes in the application code, do not be afraid to inline them and extract them again according to meaningful domain boundaries. It is important to remember that good design principles are equally applicable to test code. The maintainability and readability of test code should be on par with application code, although the impact of code duplication on testing may not be so severe, allowing repetition to exist conceptually may cause maintenance problems as serious as those caused by application code. Therefore, the test code also deserves our attention and maintenance.