Testing Python a micro tutorial

Why Testing?

(Automated) testing is probably one of the most overlooked technique in the programming world.

Traditionally in the software development industry we had a clear separation between "programmers" or "coders" or "software engineers" on one hand and "qa people" or "testers" on the other hand. The latter would do the manual testing, the Quality Assurance of the product.

Traditionally the company and thus the programmers relied on the QA people to find the bugs and to ensure high quality.

This approach was never really good, but today it totally breaks down for a variety of reasons in a number of cases.

Growing complexity

While the application growth and has more and more complex feature the time and manpower allocated for manual QA stays constant or growth at a much slower pace. That means that even if we have infrequent releases the QA team doing manual testing cannot check all the features and all the possibilities for every release. So traditionally companies only verified the new features and an a subset of all the other features that were deemed to be important.

As the gap between complexity of the application and the available time to check everything grows so does the uncertainty in the quality of the software. Bugs start to fall into the gap and programmers start to fear the release more and more.

Fast-paced development with frequent releases - CI/CD Continuous Integration / Continuous Delivery

In the age of fast-paced development when the release cycles aren't measured in months and not even in days, but in hours or minutes, there is no time for a separate, manual QA process. You must automate the testing and verification process. Computers are much better at boring repetitive tasks than humans.

Academia

If you are a student you don't have the luxury of having a people do QA for you. In fact you might be the person doing manual QA for the projects of your professor. I teach Python programming to biology and in general life-sciences students. Some of them will end up writing lots of software to support their research and thesis. They don't have a separate QA department that would check the applications they wrote. They have to do it themselves. Are they interested wasting time on checking the same thing repeatedly? Not likely.

Open Source

Most of the Open Source developers work on their project in their spare time. They don't have any means to pay someone to do quality assurance for them. They are also not very interested in doing manual QA themselves nor they want to fix the same bug twice. Besides, as they are not getting paid to write that software one of the main value they can back for their investment is the respect of others. They won't get much respect for a buggy and unreliable software. So Open Source developers tend to write a lot of automated tests. Certainly a lot more than in a corporation. This is true even for the same person in the two different situations. I know a number of people who write lots of tests for their Open Source projects, but almost none at work.

Testing Flask

Before getting to our own simple example, let's see the tests of one of the most popular Python libraries. Flask is a minimalistic framework to write web applications in Python. It is an open source project that has been around for many year.

We can install it using the regular tools, e.g. pip install flask, but what we are really interested in is to see how a developer of Flask can check how Flask behaves on a newer version of Python, on a different operating system, or after some changes were made to the project?

Do they break any of the existing features of Flask?
How things that were working earlier behave now? Do they still work?
Are there any regressions?

For this we need to get a local copy of the development version of the source code.

We can do this by cloning the GitHub repository of Flask and running the tests locally.

Follow these instructions

The following commands were used on my Linux machine:

Clone the project and enter the cloned folder:

git clone https://github.com/pallets/flask.git
cd flask
git switch stable

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

Setup the virtual environment which is basically a folder called .venv:

uv sync

Activate the virtual environment:

. .venv/bin/activate

Run the tests, the type-checker, and the documentation checker

pytest
mypy
tox run -e docs

If that's not enough you can also install the coverage module

uv pip install coverage

run the test using it

coverage run -m pytest

and then generate the coverage report:

coverage report -m --include=src/*

It will show you that 92% of the code has tests.

That means the developers have invested a lot of time and energy making sure their code works the same way day in, day out.

How do you test your code?

This mini-series is for people who don't have the time to delve into the way you'd write tests for your Python code, but would like to get a quick overview of the possibilities.

However, before we can get into actually testing things, it is worth to think about the following questions:

What kind of application do you test?

Web application?
Command line application?
Desktop application?
Batch jobs?
...

How complex is your environment?

An applications can be a single executable that works locally on the command line.
Another application might need 2 database and 10 other services to be running.

What is the goal of the tests?

Help with development?
Help verify that the product does what it is expected to do.

What is testing?

So what do we really mean when we mean testing?

For every piece of code whether it is a small module or a huge application, you can have the following equation.

Fixture + Input = Expected Output

Fixture = environment

Every application works in some environment. For example if we have an application that takes all the CSV files in a given folder, analyzes them and creates images with png extension for each file, then the starting environment of this application is a folder with one or more csv files and without and png file.

If the application is a complex system, the environment might include multiple networking elements, servers, databases, ioT devices etc.

If the application is a simple: print the sum of these two numbers, then the environment does not have anything in it. In that case the environment is just the interpreter/compiler.

No matter what, the environment is called by the testing people the "Fixture".

Input

Once we setup the fixture, we execute the code - the Application Under Test or AUT - and give it some input.

This will generate some kind of a result. Something printed on the screen, a bunch of new file, a change in the database, etc.

Expected Output

That result should equal to some "Expected Output".

So this is our equation.

Fixture + Input = Expected Output

What is testing, really?

In reality, however, many times we don't get exactly the expected output. Instead there is a small (or not so small) difference. That's the bug.

So our equation actually looks like this:

Fixture + Input = Expected Output + Bugs

The goal of (automated) testing is to make it easy and cheap to notice when these bugs creep in.

To put it in other words, when you write your code you can check if the result is as expected either manually or by writing some automated tests. The question is, how will you know your piece of code still works half a year from now when someone made some changes to some other part of the code?

Will you repeat all the manual tests you did earlier? You won't have time for that.

On the other hand if you automated your tests in the first place, then you can easily, quickly and cheaply run them again and you can verify if everything still works as earlier or if a bug appeared.

You can do this as many times and as frequently as you like. Meaning that even during active development you can constantly, quickly and cheaply check if you, or any of your co-workers have broke anything.

It is obviously not perfect. As we saw Flask has an impressive a 92% test coverage. It means that 8% of the code is not executed in any of the tests. So if some breaking change happens in any of those lines then even these tests won't catch it. Moreover even in the 92% of the code that was tested, there might be bugs. Bugs that might appear at circumstances that were not tested.

However, having such a high test coverage will give us a lot more confidence in our code. We'll be able to make changes a lot faster knowing that if we make a mistake we have very good chances of noticing it before it reaches anyone else. We also know that if we do find a bug that was not reported by our test suit, then we will be able to add a test that will expose that bug and so when we fix the bug we'll be confident that this bug won't reappear.

Testing tools

Python has several libraries that help you writing and running automated tests. doctest and unittest come with python so they are part of the "standard libraries" of Python, but they are somewhat limited in their features.

pytest is a much feature-rich library with lots of extensions. In the recent years it became the de-facto standard for writing and running tests in Python.

In these examples we are going to see these 3 Python modules that can be used for testing.

doctest
unittest
pytest

Testing demo methodology

We won't delve deep into the capabilities of these testing libraries. We will only use a very simple example to demonstrate how to write tests. First we'll see a test that is passing, meaning the actual result is the same as the expected result.

Then we'll also see a failing test where the actual result is different from the expected result. This usually indicats a bug in our code, though we always have to keep an open mind. The bug might be in the test or we might have an incorrect expectation.

We are not perfect, we just keep trying to improve.

Have a simple AUT - Application Under Test with an obvious bug.
Write a passing test.
Write a failing test exposing the bug.

AUT - Application Under Test

Given the following module with two functions, how can we verify that these functions work properly.

The mymath.py file contains the following:

def add(x, y):
    return x * y

def multiply(x, y):
    return x + y

# Yes, I know there are bugs in this code!

You probably noticed that the operators in our functions are incorrect. The function called add is expected to add two numbers, but the implementation has a bug. It actually multiplies the two numbers. The function called multiply on the other hand actuall adds the operands together.

I know it is a very obvious issue, but it is great as it allows us to see the mechanics of testing without getting distracted by a complex implementation and a complex problem.

Rest assured, the mechanism of the testing would be the same even if our function was calculating the moon-landing trajectory.

Testing is not rocket science.

Use the function in the module

Before we start writing an "automated test", let's see how one could test this code "manually". In reality I see this many times. I see people write short snippets of code to check if their real code works properly, but they don't turn these small snippets into real tests. They don't add them to version control and they don't set up a Continuous Integration (CI) system that would run all the tests on every push to GitHub or GitLab.

Basically the question is "How can we use the add function of the mymath module?"

The code using the module is straight forward. We import the mymath module. We also import the sys module to be able to access the command line arguments. We take two arguments from the command line, call the add function, and print the result.

Then, if we would like to make sure our code works well, we can compare that result to the expected result we calculated in our head.

We try to see if 2+2=4. Based on this everything works fine.

The use_mymath.py file:

import mymath
import sys

if len(sys.argv) != 3:
    exit(f"Usage {sys.argv[0]} NUMBER NUMBER")

a = int(sys.argv[1])
b = int(sys.argv[2])

result = mymath.add(a, b)

print(result)

Usage:

python use_mymath.py 2 2
4

Testing demo: doctest

The first way we are going to look at is using the "doctest" module. It is a very nice tool that allows us to test our code and to also verify that our documentation is aligned with the code. In addition to that, doctest is a standard module. It comes with every installation of Python so you don't need to worry about installation.

The big drawback is that it is not really useful for anything complex.

Anyway, how does it work?

In Python if you add a string immediately after the declaration of the function - meaning the line immediately after the "def" statement - that string becomes the documentation of the function. It can be a one-line string using regular quotes or a multi-line string using triple-quotes.

In the documentation you can write free text and you can also write examples as if one was using the interactive shell of Python. For these examples we have code snippets preceded with 3 greater-than signs >>>, the prompt of the in Python interactive shell. The lines immediately after that contain the result that you'd see if you actually typed in the expression into the interactive shell.

Having such examples in the documentation is great as it makes it easy for most programmers to see how the specific function is expect to be called and what results it is expected to give. But how can we, the authors of the code and the documentation make sure that the examples are really correct? How can we make sure that we'll update the examples if the function changes its behaviour and thus it changes the results. (e.g. it becomes more precize having more digits after the decimal point)

Doctest

Doctest will read your source code, look at all the functions you have and for each function it will look at the documentation of the function. If in the documentation it sees examples staring with 3 greater-than signs >>> then it will take the content of that line as code to be executed and it will take the next line as the expected result. Doctest will execute each code snippet and compare it with the expected results. Effectively checking if the examples in your documentation and the implementation are aligned.

Of course it cannot check if that is really the correct answer. If you make the same error both in the code and in the example then it will still think that everything is fine, but at leas now it might be easier for a user to point out the mistake as it will be seen in the documentation already. No need to read the source code.

We can run doctest in the following way: python -m doctest mymath.py. If all the tests pass, then this execution will print nothing. This lack of positive feedback is a bit strange, but that's how it work. You might want to check the so-called "exit code" of the execution. On Unix systems such as Linux and macOS, you'd inspect the $? environment variable while on MS Windows you need to inspect the %ERRORLEVEL% variable. On all of these systems you can use the echo command to inspect the variables. In either case 0 indicates success and any other number indicates failure.

The new mymath.py file:

def add(x, y):
    """
    This function will add two numbers together
    >>> add(2, 2)
    4
    >>>
    And here we can have more documentation.
    """
    return x * y

def multiply(x, y):
    return x + y

# Yes, I know there are bugs in this code!

Run the tests and check the exit code on Linux or macOS:

$ python -m doctest mymath.py
$ echo $?
0

Run the tests and check the exit code on MS Windows:

> python -m doctest mymath.py
> echo %ERRORLEVEL%
0

Once you see this you might conclude that you have tested the add function (for now let's forget about the multiple function) and you might release you "application". However soon someone will come complaining that it does not work correctly in all the cases. If you are lucky they will even provide you with at least one pair of numbers where the result is incorrect.

doctest
$?
%ERRORLEVEL%

Testing demo: doctest with failure

Of course we know that our code is not perfect (to say the least) so at one point someone will complain about the incorrect results received, for example in case they try to add 3 and 3. Before running and fixing the code however it is better to write a test case with the expected correct result that will fail.

So we added another example to the documentation.

If we run the same command as we did earlier we'll get an extensive output on the screen and the exit code with have some value different from 0.

At this point you'd probably also go and fix the code, but you have also increased the number of tests and eliminated the possibility of this failure to return unnoticed.

def add(x, y):
    """
    This function will add two numbers together
    >>> add(2, 2)
    4
    >>> add(3, 3)
    6
    >>>
    And here we can have more documentation.
    """
    return x * y

def multiply(x, y):
    return x + y

# Yes, I know there are bugs in this code!

************************************************
File "mymath.py", line 5, in mymath.add
Failed example:
    add(3, 3)
Expected:
    6
Got:
    9
************************************************
1 items had failures:
   1 of   2 in mymath.add
***Test Failed*** 1 failures.

$ python -m doctest mymath.py
$ echo $?
1

> python -m doctest mymath.py
> echo %ERRORLEVEL%
1

Testing demo: Unittest success

unittest
TestCase
assertEqual

Python comes with a built-in module for writing tests. Its name is unittest which might be a bit confusing as this module can be used to any kind of more complex feature-tests and other modules can be also used to write so called unit-tests.

Unlike the doctests that were part of the actual code, the unittest library calls for separate test files. It is recommended that the names of files start with the test_ prefix as that will make it easy for the various testing tools to locate them.

Inside the file you'd need to import both the unittest module and the module that we are testing. mystest in this case.

We need a class with a name that starts with Test and inherits from unittest.TestCase. In the class we can have one or more testing functions. Each one starts with a test_ prefix. Inside the function we can call the function that we are testing and we can compare the result returned by it to some expected value. We can compare them in various ways using the various assert-methods of the unittest.TestCase. In this example we used the assertEqual method as we wanted to make sure the actual return value equals the expected value.

We can run the tests using python -m unittest test_one_with_unittest.py. It will have some output on the screen indicating all the tests passed. The exit-code will be 0 as expected.

unittest

import unittest
import mymath

class TestMath(unittest.TestCase):
    def test_math(self):
        self.assertEqual(mymath.add(2, 2), 4)

.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK

$ python -m unittest test_one_with_unittest.py
$ echo $?
0

> python -m unittest test_one_with_unittest.py
> echo %ERRORLEVEL%
0

Testing demo: Unittest failure

When we get the report on the incorrect results when adding 3 and 3, we can added another test-case. We could have added another assertion to the test_math function or we could have created a separare class with its own function, but in this case we opted creating a separate test-function.

We won't go into the pros and contras of each strategy now as we are only interested in the basic technique.

If we run the tests now the output will indicate that it ran 2 test-cases and one of them failed. It even shows use some details about the expected value and the actual value that can be really useful understanding the source of the problem.

Note there is also .F in the output. The dot indicates the test-function that passed, the F indicates the test-function that failed.

The exit code is again different from 0.

BTW this exit-code is used by the various CI systems to understand the results of the tests.

import unittest
import mymath

class TestMath(unittest.TestCase):
    def test_math(self):
        self.assertEqual(mymath.add(2, 2), 4)

    def test_more_math(self):
        self.assertEqual(mymath.add(3, 3), 6)

.F
======================================================================
FAIL: test_more_math (test_with_unittest.TestMath)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gabor/work/slides/python/examples/testing-demo/test_with_unittest.py", line 9, in test_more_math
    self.assertEqual(mymath.add(3, 3), 6)
AssertionError: 9 != 6

----------------------------------------------------------------------
Ran 2 tests in 0.000s

FAILED (failures=1)

$ python -m unittest test_with_unittest.py
$ echo $?
1

> python -m unittest test_with_unittest.py
> echo %ERRORLEVEL%
1

Testing demo: pytest using classes

pytest
assert

In our third example we are going to use the pytest module. The only drawback of the pytest module is that it does not come with the installation of Python itself. It is not a huge issue though as you probably install hundreds of other modules as well.

These days Pytest seems like the most popular testing library for Python.

We'll have several examples using Pytest.

In order to use it you'd create a file with a name that starts with test_ prefix. We need to import the module we are testing but we don't need to import pytest. Actually we don't even use pytest inside the code. (At least not in the simple use-cases.) In the file you need to create a class starting with Test, but this class does not need to inherit from any special class. In the class we can have one or more test-functions starting with the prefix test_. In the function we call the function we are testing and we compare the results to the expected results.

We use the built-in assert function of Python to check if the results were true.

No need to learn various specialized assert-statements as we had in the unittest module.

We run the test using the pytest command.

We'll get some output. Here too the single dot after the name of the test file indicates that there was one successful test function.

The exit-code of this execution in 0 as was the case with unittest.

pip install pytest

import mymath

class TestMath():
    def test_math(self):
        assert mymath.add(2, 2) == 4

============================= test session starts ==============================
platform linux -- Python 3.8.2, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
rootdir: /home/gabor/work/slides/python/examples/testing-demo
plugins: flake8-1.0.6
collected 1 item

test_with_pytest_class.py .                                              [100%]

============================== 1 passed in 0.01s ===============================

$ pytest test_with_pytest_class.py
$ echo $?
0

> pytest test_with_pytest_class.py
> echo %ERRORLEVEL%
0

Testing demo: pytest using classes - failure

Here too we can add additional test-functions to the same test-class. Executing pytest will print .F indicating one passing test-function and one failing test function. We'll get detailed explanation where the failure happened.

The exit-code will be different from 0 helping the CI systems and any other external system to know that the tests have failed.

import mymath

class TestMath():
    def test_math(self):
        assert mymath.add(2, 2) == 4

    def test_more_math(self):
        assert mymath.add(3, 3) == 6

============================= test session starts ==============================
platform linux -- Python 3.8.2, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
rootdir: /home/gabor/work/slides/python/examples/testing-demo
plugins: flake8-1.0.6
collected 2 items

test_with_pytest_class_failure.py .F                                     [100%]

=================================== FAILURES ===================================
___________________________ TestMath.test_more_math ____________________________

self = <test_with_pytest_class_failure.TestMath object at 0x7fcddf82b7c0>

    def test_more_math(self):
>       assert mymath.add(3, 3) == 6
E       assert 9 == 6
E        +  where 9 = <function add at 0x7fcddf82a0d0>(3, 3)
E        +    where <function add at 0x7fcddf82a0d0> = mymath.add

test_with_pytest_class_failure.py:8: AssertionError
=========================== short test summary info ============================
FAILED test_with_pytest_class_failure.py::TestMath::test_more_math - assert 9...
========================= 1 failed, 1 passed in 0.03s ==========================

$ pytest test_with_pytest_class_failure.py
$ echo $?
1

> pytest test_with_pytest_class_failure.py
> echo %ERRORLEVEL%
1

Testing demo: pytest without classes

In the previous example we used a test-class to write our tests, but in reality in many cases we don't need the classes. We could just as well write plain test-functions as in this example.

Test-functions without a class around them are easier to write and understand and they are a lot simplert to graps. So unless you really need the features a class can provide I'd recommend you use functions only. After all our test code should be a lot more simple than our application code.

pip install pytest

import mymath

def test_math():
    assert mymath.add(2, 2) == 4

=========================== test session starts ============================
platform linux -- Python 3.11.2, pytest-7.3.1, pluggy-1.0.0
rootdir: /home/gabor/work/slides/python/examples/testing-demo
plugins: anyio-3.6.2
collected 1 item

test_with_pytest.py .                                                [100%]

============================ 1 passed in 0.00s =============================

$ pytest test_with_pytest.py
$ echo $?
0

> pytest test_with_pytest.py
> echo %ERRORLEVEL%
0

Testing demo: pytest without classes failure

import mymath

def test_math():
    assert mymath.add(2, 2) == 4

def test_more_math():
    assert mymath.add(3, 3) == 6

=========================== test session starts ============================
platform linux -- Python 3.11.2, pytest-7.3.1, pluggy-1.0.0
rootdir: /home/gabor/work/slides/python/examples/testing-demo
plugins: anyio-3.6.2
collected 2 items

test_with_pytest_failure.py .F                                       [100%]

================================= FAILURES =================================
______________________________ test_more_math ______________________________

    def test_more_math():
>       assert mymath.add(3, 3) == 6
E       assert 9 == 6
E        +  where 9 = <function add at 0x7f764062cea0>(3, 3)
E        +    where <function add at 0x7f764062cea0> = mymath.add

test_with_pytest_failure.py:7: AssertionError
========================= short test summary info ==========================
FAILED test_with_pytest_failure.py::test_more_math - assert 9 == 6
======================= 1 failed, 1 passed in 0.08s ========================

$ pytest test_with_pytest.py
$ echo $?
1

> pytest test_with_pytest.py
> echo %ERRORLEVEL%
1

Testing demo: Failure in one sub

import mymath

def test_math():
    assert mymath.add(3, 3) == 6
    assert mymath.add(2, 2) == 4

Testing demo: pytest run doctests

The nice thing about pytest that it can also run all the doctests in your module. So you can start your testing journey with doctest and later switch to pytest.

You can easily test your examples in your documentation.

$ pytest --doctest-modules mymath.py

Testing demo: pytest run unittest

Pytest can also run the unit-test. You don't even need to tell it anything special. It will introspect the test code and if it notices tests-classes that are based on unittest it will execute them using the unittest module.

$ pytest test_one_with_unittest.py
$ pytest test_with_unittest.py

Test demo: test coverage

pip install pytest-cover

$ pytest test_with_pytest.py --cov mymath --cov-report html --cov-report term

========================= test session starts ==========================
platform linux -- Python 3.11.2, pytest-7.3.1, pluggy-1.0.0
rootdir: /home/gabor/work/slides/python/examples/testing-demo
plugins: anyio-3.6.2, cov-4.0.0
collected 1 item

test_with_pytest.py .                                            [100%]

---------- coverage: platform linux, python 3.11.2-final-0 -----------
Name        Stmts   Miss  Cover
-------------------------------
mymath.py       4      1    75%
-------------------------------
TOTAL           4      1    75%
Coverage HTML written to dir htmlcov


========================== 1 passed in 0.03s ===========================

Open htmlcov/index.html

Exercise: Testing demo - anagrams

An anagram is a pair of words that are created from exactly the same set of characters, but of different order.
For example listen and silent
Or bad credit and debit card
Given the following module with the is_anagram function write tests for it. (in a file called test_anagram.py)
Write a failing test as well.
Try doctest, unittest, and pytest as well.

def is_anagram(a_word, b_word):
    return sorted(a_word) == sorted(b_word)

Sample code to use the Anagram module.

from anagram import is_anagram
import sys

if len(sys.argv) != 3:
    exit(f"Usage {sys.argv[0]} WORD WORD")

if is_anagram(sys.argv[1], sys.argv[2]):
    print("Anagram")
else:
    print("NOT")

Exercise: Test previous solutions

Go back to your solutions to the previous exercises
Write tests
If you feel it is hard, maybe you need to change the code to make it more testable.

Solution: Testing demo

from anagram import is_anagram

def test_anagram():
    assert is_anagram("silent", "listen")
    assert is_anagram("bad credit", "debit card")

def test_not_anagram():
    assert not is_anagram("abc", "def")

def test_should_be_anagram_spaces():
    assert is_anagram("anagram", "nag a ram")


def test_should_be_anagram_case():
    assert is_anagram("Silent", "Listen")

Keyboard shortcuts

Python Testing Demo