As your software development projects grow in size and complexity, so too will your sets of unit tests. Let's explore the best practices for organizing and executing these crucial components of codebase maintenance.
I. Organizing Growing Sets of Tests
A. Introduction
1. Importance of unit tests in development
In software development, unit tests act as the first line of defense against bugs. They validate individual parts (or units) of your code to ensure that each function or method behaves as expected. This safety net not only makes your codebase more resilient to future changes but also gives you the confidence to refactor or expand your codebase.
2. Challenge of managing growing number of unit tests
However, as the codebase grows, so does the number of unit tests. When these tests are not properly managed, they can become a tangled mess that's hard to navigate. They might be scattered across different parts of your project, making it difficult to find specific tests or to understand the overall testing coverage.
B. Existing Functions to Test
1. Overview of functions
To illustrate, let's consider a hypothetical project with four functions in need of tests:
a. row_to_list() b. convert_to_int() c. get_data_as_numpy_array() d. split_into_training_and_testing_sets()
These functions cover typical tasks in data preprocessing, conversion, and splitting datasets. The task now is to effectively organize and manage tests for these functions as part of a larger codebase.
C. Project Structure
1. Top level directory structure (src)
Most Python projects follow a certain directory structure. Here's a simplified example:
/src
/data
/features
/models
2. Data package with functions for data preprocessing
In the /data package, we'd have functions like row_to_list() and convert_to_int(), which help with preprocessing raw data.
3. Features package with functions for feature extraction
The /features package would house get_data_as_numpy_array(), a function responsible for transforming preprocessed data into a suitable format for machine learning models.
4. Models package with functions for model training and testing
Finally, /models would contain split_into_training_and_testing_sets(), a function to partition the data for model training and testing.
D. Test Folder
1. Explanation of the test suite
Our tests for these functions will live in a separate top-level directory, conventionally named tests. A collection of tests, like the ones we're about to write, is known as a test suite.
2. Mirroring application structure in test folder
It's common practice to mirror your application structure in your test folder. This means if you have a module at src/data/preprocessing.py, the corresponding test module would live at tests/data/test_preprocessing.py.
3. Creating corresponding test module for each Python module
Thus, for each of our Python modules (say, preprocessing.py, feature_extraction.py, and modeling.py), we'll create corresponding test modules in the tests directory.
E. Test Modules and Test Classes
1. Benefits of structuring tests inside test modules
Why organize tests in modules and classes? This arrangement offers a clear overview of what each test is validating. It also makes it easier to find, add, remove, or modify tests as your codebase evolves.
2. Introduction to the concept of test class
In the context of testing, a test class is a class that contains one or more test methods. Each test method (a function that starts with the word test) represents one unit test.
3. Explanation on how to declare and name a test class
To create a test class, we use Python's built-in class keyword. By convention, the
name of the test class should relate to the function it's testing. For instance, if we're testing convert_to_int(), our test class might be named TestConvertToInt.
4. How to place tests within the test class
Within this class, we'd then define our tests as methods, each starting with the word test. Here's an example:
class TestConvertToInt:
def test_on_string_with_one_comma(self):
...
def test_on_string_with_two_commas(self):
...
5. The use of self argument in tests
Note the self argument in each test method. In Python, self refers to the instance of the class. In the context of unit tests, it's used to access the testing framework's APIs, although in many simple cases, you won't need to use it.
F. Clean Separation of Test Classes
1. Introduction to TestConvertToInt class
Here's a closer look at a test class for convert_to_int():
class TestConvertToInt:
def test_on_string_with_one_comma(self):
# Given
test_input = "2,081"
expected = 2081
# When
result = convert_to_int(test_input)
# Then
assert result == expected
2. Adding tests for convert_to_int() in TestConvertToInt
For every conceivable input, we write a test. For instance, we could write test_on_string_with_no_comma or test_on_float. These tests reside in the same class because they're all testing convert_to_int().
G. Final Test Directory Structure
1. Repeating procedure for test_as_numpy.py and test_train.py
We'd follow the same procedure for get_data_as_numpy_array() and split_into_training_and_testing_sets(). We'd end up with this structure:
/tests
/data
test_preprocessing.py # Contains TestRowToList and TestConvertToInt
/features
test_feature_extraction.py # Contains TestGetDataAsNumpyArray
/models
test_modeling.py # Contains TestSplitIntoTrainingAndTestingSets
2. Test class organization in the test directory
Each file corresponds to the file structure in src. Each test class corresponds to a function we're testing, and each test method within a class tests a different input or edge case for its corresponding function.
II. Mastering Test Execution
A. Overview of Test Organization
1. Centralization of tests in the test folder
We've placed all our tests in the tests folder, which makes it a one-stop shop for everything test-related. This clear demarcation makes it easier to manage our growing number of tests and facilitates better collaboration among team members.
2. Hierarchical structure of the test folder: mirror packages > test modules > test classes
Within the tests folder, we've used a hierarchy that mirrors our application's
package structure. At the highest level, we have packages that correspond to our application packages (data, features, models). Within these, we have test modules that correspond to our Python modules. And within each test module, we have test classes that correspond to functions in our application.
B. Running All Tests
1. Method to run all tests using pytest
To run all our tests at once, we use the command pytest at the command line. This should be done from the root directory of our project.
$ pytest
2. Explanation of how pytest discovers and runs all tests
When we run pytest, it recursively searches for all files named test_*.py or *_test.py in the current directory and its subdirectories. It then collects all the test methods in these files and runs them.
3. Presentation of test results
Once pytest finishes running, it outputs the results to the command line. For each test, you'll see a dot (.) if the test passed or an F if it failed. At the end, pytest provides a summary of the tests.
============================= test session starts =============================
collected 20 items
test/test_preprocessing.py ........
test/test_feature_extraction.py ......
test/test_modeling.py ......
========================= 20 passed in 0.12 seconds =========================
C. Practical Use Case: Continuous Integration (CI) Server
1. Use of pytest command in a CI server
The pytest command can be easily integrated into a Continuous Integration (CI) server. CI is a development practice where developers integrate code into a shared repository frequently, preferably several times a day. Each integration is verified by an automated build, including test, to detect integration errors as quickly as possible.
2. Importance of the binary question: do all unit tests pass?
A crucial part of CI is the concept of a build "passing" or "failing". In our context, that translates to a simple binary question: do all our unit tests pass? If they do, the build passes. If not, it fails.
D. Running Subset of Tests
1. Running tests in a particular test module
There are times when we want to run only a subset of tests - for instance, those within a particular test module. We can specify this in pytest using the file path:
$ pytest tests/data/test_preprocessing.py
2. Running tests within a specific test class
We can also run all the tests within a specific test class. For example:
$ pytest tests/data/test_preprocessing.py::TestConvertToInt
3. Introduction to node ID in pytest
In pytest, the term "node ID" refers to the path used to specify a particular test, test class, or test module to run. For example, the node ID tests/data/test_preprocessing.py::TestConvertToInt specifies the TestConvertToInt test class in the test_preprocessing.py module.
4. Running tests using node ID
We can use the node ID to run a specific test within a test class:
$ pytest tests/data/test_preprocessing.py::TestConvertToInt::test_on_string_with_one_comma
E. Running Tests using Keyword Expressions
1. The -k option in pytest
Pytest's -k option allows us to run tests that match a particular name. For instance, if we want to run all tests that involve the "comma", we can do:
$ pytest -k "comma"
2. Running tests with a specific pattern in their name
This will run all tests with "comma" in their name, no matter which test module or test class they're in.
3. Use of Python logical operators for complex subsetting of tests
We can use Python's logical operators for more complex subsetting. For example, if we want to run all tests that involve "comma" but not "two":
$ pytest -k "comma and not two"
And that's it! You've learned how to effectively organize and execute your unit tests in Python. By maintaining an organized test suite and mastering pytest commands, you'll be able to handle any size of a project without losing track of your tests. Happy testing!