Python, Behave and Mockito-Python

This article was originally published in November 2012 issue of Open Source for You. It is republished here under Creative Commons Attribution-Share Alike 3.0 Unported license.

Here’s an article that explores behaviour-driven development with Python for software developers interested in automated testing. Behaviour-driven development approaches the designing and writing of software using executable specifications that can be written by developers, business analysts and quality assurance together, to create a common language that helps different people communicate with each other.

This article has been written using Python 2.6.1, but any sufficiently new version will do. Behave is a tool written by Benno Rice and Richard Jones, and it allows users to write executable specifications. Behave has been released under a BSD licence and can be downloaded from http://pypi.python.org/pypi/behave. This article assumes that version 1.2.2 is being used. Mockito-Python is a framework for creating test doubles for Python. It has been released under an MIT licence, and can be downloaded from: https://bitbucket.org/szczepiq/mockito-python/. This article assumes the use of version 0.5.0, but almost any version will do.

Our project
For our project, we are going to write a simple program that simulates an automated breakfast machine. The machine knows how to prepare different types of breakfast, and has a safety mechanism to stop it when something goes wrong. Our customer has sent us the following requirements for it:

“The automated breakfast machine needs to know how to make breakfast. It can boil eggs, both hard and soft. It can fry eggs and bacon. It also knows how to toast bread to two different specifications (light and dark). The breakfast machine also knows how to squeeze juice from oranges. If something goes wrong, the machine will stop and announce an error. You don’t have to write a user interface; we will create that. Just make easy-to-understand commands and queries that can be used to control the machine.”

Layout of the project
Let’s start by creating a directory structure for our project:

mkdir -p breakfast/features/steps

The breakfast directory will contain our code for the breakfast machine. The features directory is used for storing specifications for features that the customer listed. The steps directory is where we specify how different steps in the specifications are done. If this sounds confusing, don’t worry, it will soon become clear.

The first specification
In the case of Behave, the specification is a file containing structured natural language text that specifies how a given feature should behave. A feature is an aspect of a program—for example, boiling eggs. Each feature can have one or more scenarios that can be thought of as use cases for the given feature. In our example, the scenario is hard-boiling a single egg. First, we need to tackle boiling the eggs. That sounds easy enough. Let’s start by creating a new file in the features directory, called boiling_eggs.feature, and enter the following specification:

Feature: Boiling eggs
  as an user
  in order to have breakfast
  I want machine to boil eggs

Background:
  Given machine is standing by

Scenario: boil hard egg
  Given machine has 10 eggs
  And egg setting is hard
  And amount of eggs to boil is set to 1
  When machine boils eggs
  Then there should be 1 boiled egg
  And eggs should be hard

It does not look like much, but we are just getting started. Notice how the specification reads almost like a story. Instead of talking about integers, objects and method calls, the specification talks about machines, eggs and boiling. It is much easier for non-coders to read, understand and comment on this kind of text. At this point, we can already try running our tests, by moving into the breakfast directory and issuing the command behave. Behave will try to run our first specification, and will fail because we have neither written steps or implemented the actual machine. Because Behave is really helpful, it will tell us how to proceed:

You can implement step definitions for undefined steps with these snippets:

@given(u'machine is standing by')
def impl(context):
    assert False

Now we need to define what is to be done when the breakfast machine is standing by and has 10 eggs ready to be boiled. Create the file eggs.py in the steps directory and add the following code in it:

from breakfast import BreakfastMachine
from behave import *

@given(u'machine is standing by')
def impl(context):
    context.machine = BreakfastMachine()

@given(u'machine has {egg_amount} eggs')
def impl(context, egg_amount):
    context.machine.eggs = int(egg_amount)

Steps are the bridge between the specification and the system being tested. They map the natural-language sentences into function calls that the computer can understand.

The first function defines what will happen when there should be a breakfast machine standing by. Let’s create a new instance of BreakfastMachine and store it in context, which is a special object that Behave keeps track of. It is passed from step to step, and can be used to relay information between them. Eventually, we will use it to assert that the specification has been executed correctly.

This defines code that is executed when there is a step ‘machine has x eggs’, where x can be anything (in our example it is 10). {egg_amount} is automatically parsed and passed as a parameter to the function, which has to have an identically named parameter. Note that the parameters are Unicode strings, and thus need to be converted to integers in our example.

If we were to run Behave at this point, we would get an error message that BreakfastMachine cannot be imported. This, of course, is because we have not yet written it. It might feel strange to start coding from this end of the problem (specifications and tests), instead of diving headlong into coding BreakfastMachine. The advantage of approaching the task from this direction is that we can first think about how we would like our new object to behave and interact with other objects, and write tests or specifications that capture this. Only after we know how we would like to use the new object do we start writing it.

In order to continue, let’s create BreakfastMachine in the file breakfast.py and save it in the breakfast directory. This is the class our client asked us to write and which we want to test:

class BreakfastMachine(object):

    def __init__(self):
        super(BreakfastMachine, self).__init__()
        self.eggs = 0
        self.egg_hardness = 'soft'
        self.eggs_to_boil = 0
        self.boiled_eggs = 0
        self.boiled_egg_hardness = None

    def boil_eggs(self):
        pass

Steps to implement a hardness setting for eggs and the amount of eggs to boil are quite similar to setting the total amount of eggs available. The difference is that we are not creating a new BreakfastMachine, but using the one that has been stored in context earlier. This way, we configure the machine step by step, according to the specification. You can keep running Behave after each addition to eggs.py to see what kind of reports it will output. This is a good way of working, because Behave guides you regarding what needs to be done next, in order to fulfil the specification. In eggs.py, add the following:

@given(u'egg setting is {egg_hardness}')
def impl(context, egg_hardness):
    context.machine.egg_hardness = egg_hardness

@given(u'amount of eggs to boil is set to {amount}')
def impl(context, amount):
    context.machine.eggs_to_boil = int(amount)

Up to this point, we have been configuring the breakfast machine. Now it is time to get serious and actually instruct the machine to boil our eggs, and verify afterwards that we got what we wanted. Add the following piece of code to eggs.py:

@when(u'machine boils eggs')
def impl(context):
    context.machine.boil_eggs()

@then(u'there should be {amount} boiled egg')
def impl(context, amount):
    assert context.machine.boiled_eggs == int(amount)

@then(u'eggs should be {hardness}')
def impl(context, hardness):
    assert context.machine.boiled_egg_hardness == hardness

The first step is to instruct our breakfast machine to boil eggs. The next two are to ascertain that the results of boiling are what we wanted. If you run Behave at this point, an error will be displayed, because the machine does not yet know how to boil eggs. Change breakfast.py to the following final version, and the tests should pass:

class BreakfastMachine(object):

    def __init__(self):
        super(BreakfastMachine, self).__init__()
        self.eggs = 0
        self.egg_hardness = 'soft'
        self.eggs_to_boil = 0
        self.boiled_eggs = 0
        self.boiled_egg_hardness = None

    def boil_eggs(self):
        if self.eggs_to_boil <= self.eggs:
            self.boiled_eggs = self.eggs_to_boil
            self.boiled_egg_hardness = self.egg_hardness
            self.eggs = self.eggs - self.eggs_to_boil
        else:
            self.boiled_eggs = 0
            self.boiled_egg_hardness = None

Now we have a passing scenario and some code; so what’s next? You could experiment with what we have now, and see if you can write another scenario and try to soft-boil an egg, or boil multiple eggs. Just add a new scenario in boiling_eggs. feature after the first one, leaving an empty line in between. Do not repeat the feature or background sections—those are needed only once. After experimenting for a bit, continue to the second specification.

The second specification
Let’s start by reviewing the customer’s requirements listed at the beginning of this article.

Boiling varying numbers of eggs to different specifications, i.e., hard or soft, is taken care of. Frying and toasting should be easy to implement, since it is similar to boiling eggs. “Just make easy-to-understand commands and queries that can be used to control the machine,” catches our attention, and we ask for a clarification from our customer, who sends us an API specification that tells us how their user interface is going to interact with our machine. Since they are still working on it, we’ve got only the part that deals with eggs. The following is for ui.py in the breakfast directory:

class BreakfastUI(object):

    def __init__():
        super(BreakfastUI, self).__init__()

    def eggs_boiled(self, amount, hardness):
        pass

    def error(self, message):
        pass

Their user interface is not ready yet, and their API specification is not completely ready either. But we got some of it, and it is good enough for us to start working with.

We can first tackle sending error messages to the user interface. Let’s add the following code at the end of boiling_eggs.feature:

Scenario: boiling too many eggs should give an error
  Given machine has 1 eggs
  And amount of eggs to boil is set to 5
  When machine boils eggs
  Then there should be error message "not enough eggs"

The next step, like before, is to implement new steps in
eggs.py:

from breakfast import BreakfastMachine
from ui import BreakfastUI
from mockito import verify, mock
from behave import *

@given(u'machine is standing by')
def impl(context):
    context.ui = mock(BreakfastUI)
    context.machine = BreakfastMachine(context.ui)
    ...

@then(u'there should be error message "{message}"')
def impl(context, message):
    verify(context.machine.ui).error(message)

We also need to modify our BreakfastMachine to connect to the user interface when it starts up. We do this by modifying the __init__ method in breakfast.py as per the following:

def __init__(self, ui):
    super(BreakfastMachine, self).__init__()
    self.ui = ui
    self.eggs = 0
    self.egg_hardness = 'soft'
    self.eggs_to_boil = 0
    self.boiled_eggs = 0
    self.boiled_egg_hardness = None

There are two interesting bits in eggs.py. The first is in the method where we set the machine to stand by. Instead of using a real BreakfastUI object (which wouldn’t have any implementation anyway), we create a test double that looks like BreakfastUI, but does not do anything when called. However, it can record all the calls, their parameters and order.

The second interesting part is the function where we verify that an error message has been delivered to the UI. We call verify, pass the UI object as a parameter to it, and then specify which method and parameters should be checked. Both verify and mock are part of Mockito, and offer us tools to check the interactions or the behaviour of objects.

If we run Behave after these modifications, we are going to get a new error message, as shown below:

Then there should be error message "not enough eggs" #
features\steps\eggs.py:35
Assertion Failed:
Wanted but not invoked: error(u'not enough eggs')

This tells us that the specification expected a call to the method error, with the parameter ‘not enough eggs’. However, our code does not currently do that, so the specification fails. Let’s fix that and modify how the machine boils eggs (breakfast.py):

def boil_eggs(self):
    if self.eggs_to_boil <= self.eggs:
        self.boiled_eggs = self.eggs_to_boil
        self.boiled_egg_hardness = self.egg_hardness
        self.eggs = self.eggs - self.eggs_to_boil
    else:
        self.boiled_eggs = 0
        self.boiled_egg_hardness = None
        self.ui.error('not enough eggs')

Let’s add a call to the UI object’s error method in order to let the user interface know that there was an error, and that the user should be notified about it. After this modification, Behave should run again without errors and give us a summary:

1 feature passed, 0 failed, 0 skipped
2 scenarios passed, 0 failed, 0 skipped
12 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m0.0s

There is a distinction between these two approaches. In the first one, we wanted to verify the state of the system after it boiled the eggs. We checked that the amount of eggs boiled matched our specification, and that they were of the correct hardness.

In the second case, we did not even have a real user interface to work with. Instead of writing one from scratch in order to be able to run tests, we created a test double: an object that looks like a real user interface, but isn’t one. With this double, we could verify that the breakfast machine calls the correct methods with the correct parameters.

Now we have the basics of boiling eggs. If you want, you can continue from here and add more features to the breakfast machine, like frying eggs and bacon, and squeezing juice out of oranges. Try and add a call to the user interface after the eggs have been boiled, and verify that you are sending the correct number of eggs and as per the specified hardness.

After getting comfortable with the approaches and techniques shown here, you can start learning more about Behave and Mockito-Python. Good places to start are their download pages, as both offer short tutorials.

In this brief example, we had a glimpse of several different ways of writing executable specifications. We started our project from a natural-language file that specified how we wanted our machine to boil eggs. After this specification, we wrote an implementation for each step, and used that to help us to write our actual breakfast machine code. After learning how to verify the state of a system, we switched our focus to verifying how objects behave and communicate with each other. We finished with a breakfast machine that can boil a given number of soft or hard-boiled eggs, and that will issue a notification to the user interface in case there are not enough eggs in the machine.

Note: Example code in this article is available as a zip archive at https://github.com/tuturto/breakfast/zipball/master.

Refactoring tests

Work with the magic system continues. Last time I added ability to create effects for spells. This time I work on creating those effects. Since the test for this step is really close to the previous step and the code required to make it pass is just couple of lines, I’ll concentrate on writing about something else: refactoring tests.

The test I wrote is as follows:

def test_triggering_effect(self):
    """
    Casting a spell should trigger the effect
    """
    effects_factory = mock(EffectsFactory)
    effect = mock(Effect)
    dying_rules = mock(Dying)
    when(effects_factory).create_effect(key = 'healing wind',
                                        target = self.character).thenReturn(effect)

    effect_handle = EffectHandle(trigger = 'on spell hit',
                                 effect = 'healing wind',
                                 parameters = {},
                                 charges = 1)

    spell = (SpellBuilder()
                .with_effect_handle(effect_handle)
                .with_target(self.character)
                .build())

    spell.cast(effects_factory, 
               dying_rules)
    
    verify(effect).trigger(dying_rules)

You probably notice how similar it is to the code I wrote in the previous step. It didn’t bother me when I was working on making it pass, but as soon as I was done with that, it was time to clean things up.

First step was to move these two similar tests to a separate test class called TestSpellEffects. The maintenance is easier when similar tests or tests for certain functionality are close to each other.

Next I extracted the duplicate code from each test and moved it into a setup function:

def setup(self):
    """
    Setup test cases
    """
    self.character = (CharacterBuilder()
                            .build())

    self.effects_factory = mock(EffectsFactory)
    self.effect = mock(Effect)
    self.dying_rules = mock(Dying)
    when(self.effects_factory).create_effect(key = 'healing wind',
                                        target = self.character).thenReturn(self.effect)

    self.effect_handle = EffectHandle(trigger = 'on spell hit',
                                      effect = 'healing wind',
                                      parameters = {},
                                      charges = 1)

    self.spell = (SpellBuilder()
                    .with_effect_handle(self.effect_handle)
                    .with_target(self.character)
                    .build())

The setup function is run once for each test and it is perfect place for code that would be duplicated otherwise. It’s good idea to pay close attention what actually ends into setup and try to keep it as small as possible. The more there is code in setup function the harder to read the tests seem to me.

After this change the actual tests were much smaller:

def test_creating_effect(self):
    """
    Casting a spell should create effects it has
    """
    self.spell.cast(self.effects_factory,
                    self.dying_rules)

    verify(self.effects_factory).create_effect(key = 'healing wind',
                                               target = self.character)

def test_triggering_effect(self):
    """
    Casting a spell should trigger the effect
    """
    self.spell.cast(self.effects_factory, 
                    self.dying_rules)
    
    verify(self.effect).trigger(self.dying_rules)

Much nicer than the first version. This highlights the fact that the test code is equally important as the production code and deserves equal attention and care.

Changes for this step are in two commits: ad2bdda513bdbd170ac7927fabcf0632b7e8658a and adfa2540bc90ae28187bb09caeba62a4f16adc50.

Effects of spells

The next step with spell casting takes me to effects. There is already a effect subsystem in place that is being used for healing potions and spider poisonings for example. Effects can be one off or they can have a duration (finite or even infinite). So it seemed like a good idea to reuse those effects for spells. It gives the possibility to have immediate spell effects (like magic missiles and fire balls) or longer lasting ones (blessings and curses).

First step is to create some effects, so I will start with a test:

def test_creating_effect(self):
    """
    Casting a spell should create effects it has
    """
    effects_factory = mock(EffectsFactory)
    effect = mock(Effect)
    dying_rules = mock(Dying)
    when(effects_factory).create_effect(key = 'healing wind',
                                        target = self.character).thenReturn(effect)

    effect_handle = EffectHandle(trigger = 'on spell hit',
                                 effect = 'healing wind',
                                 parameters = {},
                                 charges = 1)

    spell = (SpellBuilder()
                .with_effect_handle(effect_handle)
                .with_target(self.character)
                .build())

    spell.cast(effects_factory, 
               dying_rules)

    verify(effects_factory).create_effect(key = 'healing wind',
                                          target = self.character)

Essential parts are where we tell the effect_factory to return a mocked effect when it is being called with correct parameters. EffectHandles are sort of specifications that tell when to create what kind of effect. In this case, when ‘on spell hit’ event happens, we want to create a healing wind. EffectFactory on the other hand can be used to create effect by name, which in this case is ‘healing wind’.

I’m using a SpellBuilder to create instance of Spell object. The reason for this is to decouple tests and the code they are testing to a degree. Obviously there has to be some sort of connectivity, but I try to limit it a bit. If internal representation of spells change, it is enough to change the builder and all the tests are hopefully up and running again (at least if the verification part of the test works).

Executing tests at this point gives an error:

Traceback (most recent call last):
  File "C:\Python32\lib\site-packages\nose\case.py", line 198, in runTest
    self.test(*self.arg)
  File "C:\programming\pyHack\src\pyherc\test\unit\test_spells.py", line 85, in test_triggering_effect
    target = self.character)
  File "C:\Python32\lib\site-packages\mockito\invocation.py", line 98, in __call__
    verification.verify(self, len(matched_invocations))
  File "C:\Python32\lib\site-packages\mockito\verification.py", line 51, in verify
    raise VerificationError("\nWanted but not invoked: %s" % (invocation))
mockito.verification.VerificationError:
Wanted but not invoked: create_effect()

One noteworthy thing about how mockito-python verifies interactions between objects. If I have a verification like:

verify(effects_factory).create_effect(key = 'healing wind',
                                      target = character)

It does not match to a call like this:

effects_factory.create_effect('healing wind',
                              character)

If the verification is specified with keyword arguments, then the call must be made with keyword arguments:

effects_factory.create_effect(key = 'healing wind',
                              target = character)

This is a limitation in mockito-python and so far I have not found a workaround for it. It is something that needs to be remembered when working with keyword arguments.

Another advantage of reusing effects subsystem is that I already have an EffectsCollection class ready that is used to store and query Effects and EffectHandles. Instead of writing all that logic again, I can just reuse it in Spell class:

@logged
def get_effect_handles(self, trigger = None):
    """
    Get effect handles

    :param trigger: optional trigger type
    :type trigger: string

    :returns: effect handles
    :rtype: [EffectHandle]
    """
    return self.effects.get_effect_handles(trigger)

In order to make the test pass, I made following modifications to Spell class:

@logged
def cast(self, effects_factory):
    """
    Cast the spell

    :param effects_factory: factory for creating effects
    :type effects_factory: EffectsFactory
    """
    handles = self.effects.get_effect_handles('on spell hit')
    effects = []

    for target in self.target:
        for handle in handles:
            effects.append(effects_factory.create_effect(
                                            key = handle.effect,
                                            target = target))

At this point the test passes. The test verifies only that the EffectsFactory was called with parameters that it should be called with. It does not verify what is being done with the results of the call. That is left for the next step, when I want to write the code to actually trigger the created effects.

Code for this change (including the SpellBuilder) can be found in commit 62512fb694a93e81e792ec2cf41c1d10901f4c32.

Does it make sense to take this tiny steps? Couldn’t I have just written a test that checks that the effects being created are being triggered and hit two birds with a single stone? I could have done that and I think that would have been as valid approach as dividing it to two separate tests. I prefer to work with very small steps and small tests. While they help me to pinpoint the exact spot where the system is misbehaving, the drawback is the amount of code I have to write.

Travis CI environment

I recently set up a build in Travis CI environment. It is really cool hosted continuous integration service for the open source community and well integrated with GitHub to boot. Every time I push code into my public repository in GitHub, Travis is notified and will run 3 test builds for me. Each build is done with different version of Python, so I can now easily cover 2.6, 2.7 and 3.2. Especially 3.2 is very good, since I don’t have that set up on my system. If everything goes fine, I don’t even notice the whole process. Only when build breaks, I get an email stating the problem and I can start hunting the error.

Setting up Travis was really easy. I just had to enable hook for it in GitHub and write a simple configuration file:

language: python
python:
  - "2.6"
  - "2.7"
  - "3.2"
# command to install dependencies, e.g. pip install -r requirements.txt --use-mirrors
install: 
 - "easy_install -U mockito"
 - "easy_install -U pyhamcrest"
 - "easy_install -U decorator"
# command to run tests, e.g. python setup.py test
script: nosetests -w ./src/pyherc

The configuration is for running for three versions of Python (2.6, 2.7 and 3.2). At the start of the build, the system will install three libraries (mockito, pyhamcrest and decorator). Builds are always done in a clean environment and all changes are rolled back after finishing the build.

After that the system was more or less working and the only thing I had to do was to fix the codebase to work with Python 3.2.

The system has been set up to run only unit tests from pyherc project (more or less the engine of my game). UI tests I can not run without PyQt being installed and currently that is not provided as a easy install package. CPU time and bandwidth are considerations too, since I don’t want to take too much resources from a free service.

For a single developer working on a hobby project, this is probably a little bit overkill. But I like to tinker with things and try out them in a small software project, where it is easier than on a huge one.

mocking and aspects

For development of pyherc, I have been using pyDoubles (originally I was using Mock, but it was a bit pain to work with properties). However, I recently started using aspects (with aspyct) and that completely broke call verification with pyDoubles. After a bit trial-and-error, I found mockito-python that plays nicely with aspyct, works with properties and has friendly syntax.

So, why do I want to use aspects with rogue-clone? I got tired writing calls to logging over and over again. With aspects I can just add logging to a method call with a single line of code and everything else is taken care for me (of course I had to write the aspect first, but that didn’t take long).

Now I can write following (from LevelGeneratorFactory):

@logged
def get_generator(self, level_type):

….

return generator

When this is executed and debug logging has been turned on, I’ll get following in pyherc.log:

DEBUG:pyherc.generators.level.generator.LevelGeneratorFactory:get_generator :call: (‘upper crypt’,) : {}
DEBUG:pyherc.generators.level.generator.LevelGeneratorFactory:get_sub_component :call: (‘upper crypt’, [<pyherc.generators.level.partitioners.grid.GridPartitioner object at 0x01AC9330>, <pyherc.generators.level.partitioners.grid.GridPartitioner object at 0x01AC9570>], ‘partitioner’) : {}
DEBUG:pyherc.generators.level.generator.LevelGeneratorFactory:get_sub_component :return: <pyherc.generators.level.partitioners.grid.GridPartitioner object at 0x01AC9570>

DEBUG:pyherc.generators.level.portals.PortalAdderFactory:create_portal_adders :call: (‘upper crypt’,) : {}
DEBUG:pyherc.generators.level.portals.PortalAdderFactory:create_portal_adders :return: []

DEBUG:pyherc.generators.level.generator.LevelGeneratorFactory:get_generator :return: <pyherc.generators.level.generator.LevelGenerator instance at 0x01B86738>

This pretty plainly tells me that somebody wanted instance of LevelGenerator that can be used to generate upper crypts. Also, I notice that this branch of the dungeon seems to be a dead end, since no PortalAdders are created by PortalAdderFactory.

After trying out the third mocking framework so far, I’m pretty happy with the features mockito-python offers. Should I have done some research in the beginning and choose that in the first place? Probably wouldn’t have changed anything, since I didn’t know back then that I would end up using aspects (and this particular AOP-library).