Tinkering with testing in Hy

I have been playing around with Hy more recently and started liking it a lot. I’m nowhere productive in coding with it, but I’m sure learning all kinds of new things.

Recently I decided to write some tests for Sandvalley project and came up with the following:

(import [sandvalley.tests.helpers [get-in-memory-connection]])
(import [sandvalley.person [save-person load-person]])
(import [sandvalley.database.schema [create-schema]])
(import [hamcrest [assert-that is- equal-to]])

(defn test-save-person []
  (let [[person {"id" None "name" "Pete"}]
        [connection (create-schema (get-in-memory-connection))]
        [saved-person (save-person person connection)]
        [loaded-person (load-person (get saved-person "id") connection)]]
    (assert-that (get loaded-person "name") (is- (equal-to "Pete")))))

(defn test-update-person []
  (let [[person {"id" None "name" "Pete"}]
        [connection (create-schema (get-in-memory-connection))]
        [saved-person (save-person person connection)]
        [loaded-person (load-person (get saved-person "id") connection)]]
    (assoc loaded-person "name" "Uglak")
    (save-person loaded-person connection)
    (let [[updated-person (load-person (get saved-person "id") connection)]]
      (assert-that (get updated-person "name") (is - (equal-to "Uglak"))))))

(if (= __name__ "__main__")
  (test-save-person)
  (test-update-person))

Pretty basic things, we create an empty in-memory database and do some saving, loading and updating on it. Nice thing is that I can use Hamcrest with asserts and the API does not stand out from rest of the code. The part I don’t like is

(assoc loaded-person "name" "Uglak")

, since there I’m changing loaded-person. I’ll probably end up writing a helper function that can combine two dictionaries and use it instead.

However, I haven’t been able to figure out how to run tests with Nose. If I could compile the code to bytecode, it should be possible. Hy project has tests written in Hy and apparently I can even run them with nose. However, when I tried to imitate what they are doing, I ended up getting funny errors deep inside hy2py. So in the meantime, I have settled on running tests from __main__. The drawback of course is that as soon as one of them fails, the execution halts.

Summer of Testing

Yesterday I attended at GeekCollision event about testing to talk about my roguelike project and how I have been testing that. There were other presentations too, unfortunately I could not stay to watch them all.

One presentation particularly caught my attention. Asko Soukka was demoing a system for writing tests for Plone. The main idea was to allow tests used as part of user documentation. The first demo showed how Robot framework could take screen shots as tests are running in order them to be included as a part of documentation. The second demo showed how it is possible to capture video of Robot doing testing and having a voice synthesizer to explain what is going on. Essentially, you end up with a test that verifies a particular feature, screen shots for documentation and a tutorial video. How cool is that?

Testing screencast

As part of YIIP1400 course I created a short screencast about how testing is done in pyherc/Herculeum. It is available at YouTube. Making it was rather interesting even when the end result is quite rough and unpolished. I probably will make more of these later.

When the magic system has something substantial to show I’m going to make a short demo where I play the game and talk about the new features.

Satin and Qt event loop

Qt comes with a really good testing library called QTest. It can be used to simulate keyboard and mouse input when testing the system. PyQt naturally includes the same library, but using it can be a bit tricky sometimes. Mouse input works quite fine, but keyboard input does not work well without event loop running in the background. It is possible to get around this, by instantiating QApplication and calling exec_ – method on it. At this point QApplication takes charge of things and execution leaves the test method.

There are several ways around this. One is to construct two threads, using one to send keyboard and mouse events to the system under test. One should note though, that the QApplication should be running in the main thread (it will helpfully notify you if this is not the case). Another thing is shared resources. Constructing QPixmap objects outside of the thread running QApplication is not advisable (you’ll be notified about this too).

Second option is to construct a QTimer, that will start execution of your test code. In this model you do not need to worry about multiple threads or shared resources.

Doing either one  can get tedious, especially if the amount of tests is more than two. Satin now has a class decorator satin_suite, that will take care of the basic steps, leaving the developer free to write more expressive tests. Essentially, the decorator will perform following steps:

  1. Replace setup-function with a version that creates instance of QApplication and calls the original setup-function.
  2. Replace teardown-function with a version that deallocates QApplication and calls the original teardown-function.
  3. Replace each test_* function with a version that will install a QTimer, start QApplication, execute test code and exit QApplication

Details are still rough and the system has some faults (like any exception or failure halting the execution of tests).

Travis CI environment

I recently set up a build in Travis CI environment. It is really cool hosted continuous integration service for the open source community and well integrated with GitHub to boot. Every time I push code into my public repository in GitHub, Travis is notified and will run 3 test builds for me. Each build is done with different version of Python, so I can now easily cover 2.6, 2.7 and 3.2. Especially 3.2 is very good, since I don’t have that set up on my system. If everything goes fine, I don’t even notice the whole process. Only when build breaks, I get an email stating the problem and I can start hunting the error.

Setting up Travis was really easy. I just had to enable hook for it in GitHub and write a simple configuration file:

language: python
python:
  - "2.6"
  - "2.7"
  - "3.2"
# command to install dependencies, e.g. pip install -r requirements.txt --use-mirrors
install: 
 - "easy_install -U mockito"
 - "easy_install -U pyhamcrest"
 - "easy_install -U decorator"
# command to run tests, e.g. python setup.py test
script: nosetests -w ./src/pyherc

The configuration is for running for three versions of Python (2.6, 2.7 and 3.2). At the start of the build, the system will install three libraries (mockito, pyhamcrest and decorator). Builds are always done in a clean environment and all changes are rolled back after finishing the build.

After that the system was more or less working and the only thing I had to do was to fix the codebase to work with Python 3.2.

The system has been set up to run only unit tests from pyherc project (more or less the engine of my game). UI tests I can not run without PyQt being installed and currently that is not provided as a easy install package. CPU time and bandwidth are considerations too, since I don’t want to take too much resources from a free service.

For a single developer working on a hobby project, this is probably a little bit overkill. But I like to tinker with things and try out them in a small software project, where it is easier than on a huge one.

Testing and feedback

Recently I had a chance to watch my colleaque to play Herculeum (preview version of 0.8). It was quickly apparent that what I had considered as a good set of controls, weren’t that good for him. As a result, I opened a new ticket and started modifying the program to support several different kinds of control schemes. Ultimately the game is designed to be played with XBox controller + XPadder combination, but it should be playable with keyboard only of course.

This was second time I was able to watch someone else playing the game and both times I got very valuable feedback. Especially with the user interface it is important to be able actually watch how the game is being played, because you can see a lot of little things that would be left out from forum posting.

This kind of testing complements automated testing very well. All those hundreds of tests are in place to ensure that the game works from technical point of view. They provide instant feedback for me when I’m creating something new or changing existing code. When an actual human starts testing the game, the basics of the system are in working condition and he can concentrate on more complex tests and explore the system without it malfunctioning all the time.

Finding sub-widget with Satin

Satin has now ability to find a specific sub widget in widget hierarchy. Following is an example from herculeum:

def slot_with_item(name):
    """
    Create function to determine if given QWidget has an item with
    specified name

    :param name: name of item to detect
    :type name: string
    :returns: function to check if item is found or not
    :rtype: function
    """
    def matcher(widget):
        """
        Check if widget contains item with given name

        :param widget: widget to check
        :type widget: ItemGlyph
        :returns: True if name matches, otherwise false
        :rtype: boolean
        """
        if (widget != None
                and hasattr(widget, 'item')
                and widget.item != None
                and widget.item.name == name):
                    return True
        else:
            return False

    return matcher

    ...
    
    def test_dropping_item(self):
        """
        Test that item can be dropped
        """
        item = (ItemBuilder()
                    .with_name('dagger')
                    .build())

        self.character.inventory.append(item)

        dialog = InventoryDialog(surface_manager = self.surface_manager,
                                 character = self.character,
                                 action_factory = self.action_factory,
                                 parent = None,
                                 flags = Qt.Dialog)

        QTest.mouseClick(find_widget(dialog,
                                     slot_with_item('dagger')),
                         Qt.RightButton)

        assert_that(self.level, does_have_item(item.name))

New function find_widget will iterate through widget hierarchy and return the first widget that satisfied given function (slot_with_item in this case). After the widget has been found, we can use QTest.mouseClick to click it and then assert that desired action has been carried out.

Advantage of this system is that I no longer have to specify exact location of the widget I’m interested clicking or otherwise manipulating. It is enough to write a function that can detect when correct widget has been found and then call find_widget. If I move a widget in another location in widget hierarchy, I don’t necessarily have to update my tests anymore.

Test driven development and user interfaces

I have been using test driven development for my game and have been extremely happy with the results. One section that is lacking with tests is the user interface though. Since I’m currently working on some new controls, I decided to give it a better try.

Qt has good support for testing and PyQt exposes some of the needed classes. Relying only on those however, would probably create rather brittle tests. I rather not hard code names of controls and their hierarchy, if I can avoid it.

This is where Satin comes into play. Currently it is just a readme and license file, but the plan is to write little helpers that can be used to test UI without hardcoding everything:

    dialog = CharacterDialog(character)
    assert_that(dialog, has_label(character.name))

As long as there is a QLabel with text set to character’s name, this assert will pass. It does not matter what the QLabel is named or where it is located.

Behave

Behave is a tool for behaviour driven development for Python. So far I have been using hand rolled DSL for some of the behaviour tests, but decided to give behave a go today.

I had earlier written a test with Cutesy and decided to write same tests with behave in order to test the differences:

    def test_dropping_item(self):
        """
        Items dropped by character should end on the floor
        """
        dagger = Dagger()
        Uglak = Goblin(carrying(dagger))
        place(Uglak, middle_of(Level()))

        make(Uglak, drop(dagger))

        assert_that(Uglak, has_dropped(dagger))

Following is my feature specification:

Feature: Dropping items
  as an character
  in order to manage my inventory
  I want to drop items

  Scenario: drop item
     Given Pete is Adventurer
       And Pete is standing in room
       And Pete has dagger
      When Pete drops dagger
      Then dagger should be in room
       And dagger should be at same place as Pete
       And dagger should not be in inventory of Pete
       And time should pass for Pete

Nothign fancy, just a description of the feature and single scenario. It is quite a bit more verbose than the Cutesy-version, but I would think it is also quite a bit more readable for non-programmers. Of course behind the scenes there is some code to implement various steps:

@given(u'{character_name} is Adventurer')
def impl(context, character_name):
    context.characters = []
    new_character = Adventurer()
    new_character.name = character_name
    context.characters.append(new_character)

Not that many lines to implement single step (ok, it’s quite quick and dirty implementation, but still). Interesting point here is that I could reuse function Adventurer, which was originally defined for Cutesy.

Same trend continues as I define more and more steps. Cutesy makes writing tests with behave really nice and easy. Fowler mentioned something along lines that after defining an internal DSL, you can use the same implementation as a basis of external DSL. Basically you just need to add an parser. And that’s exactly what I’m doing here.

Behave has possibility to define parameters for tests. This allows reusing same steps in different tests. In the example here, I have two parameters, character name and item name. They match to character and item created in earlier tests.

@when(u'{character_name} drops {item_name}')
def impl(context, character_name, item_name):

    characters = [x for x in context.characters
                  if x.name == character_name]
    character = characters[0]
    
    items = [x for x in context.items
             if x.name == item_name]
    item = items[0]

    make(character, drop(item))
    assert True

When I run behave, I get following output:

Feature: Dropping items # features\dropping.feature:1
  as an character
  in order to manage my inventory
  I want to drop items

  Scenario: drop item                             # features\dropping.feature:6
    Given Pete is Adventurer                      # features\steps\items.py:8
    And Pete is standing in room                  # features\steps\items.py:16
    And Pete has dagger                           # features\steps\items.py:29
    When Pete drops dagger                        # features\steps\items.py:42
    Then dagger should be in room                 # features\steps\items.py:56
    And dagger should be at same place as Pete    # features\steps\items.py:67
    And dagger should not be in inventory of Pete # features\steps\items.py:79
    And time should pass for Pete                 # features\steps\items.py:91


1 feature passed, 0 failed, 0 skipped
1 scenario passed, 0 failed, 0 skipped
8 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m0.0s

Behave is pretty strong tool and really easy to use. It took me around half an hour to install it, read tutorial and write first pyherc test (which was ugly, but it worked). I don’t know if I want to use behave more in testing pyherc, or if I continue with DSL-route. Tests written with behave are easier to read for others, but pyherc is currently one man project.

Rewriting code to drop items

Long ago (like, last year), I wrote code that allowed characters to drop items they are carrying. That was before I came up with the new action system though and I never got around to actually update that corner of the code. Recently I started working with a new user interface written with PyQt4 and decided to upgrade drop code as a part of writing new inventory system.

I started writing a simple BDD-test with Cutesy. Nothing too fancy, but enough to test that the goblin can drop item it is carrying:

    def test_dropping_item(self):
        """
        Items dropped by character should end on the floor
        """
        dagger = Dagger()
        Uglak = Goblin(carrying(dagger))
        place(Uglak, middle_of(Level()))

        make(Uglak, drop(dagger))

        assert_that(Uglak, has_dropped(dagger))

Few new words had to be defined for this to work: carrying, drop and has_dropped. Obviously running the test failed and I had to implement some code to get things working. However, instead of using BDD-test to guide me through writing the code, I used it only as a definition of goal. Now I had place to start (character with item, who can not drop it) and goal (character, who has dropped an item). Small steps that would take me there, I defined as regular unit tests (some are shown below).

class TestDropAction(object):
    """
    Tests for dropping item
    """
    def __init__(self):
        """
        Default constructor
        """
        super(TestDropAction, self).__init__()

        self.item = None
        self.level = None
        self.character = None
        self.action_factory = None

    def setup(self):
        """
        Setup test case
        """
        self.level = LevelBuilder().build()
        self.item = ItemBuilder().build()

        self.character = (CharacterBuilder()
                                .with_item(self.item)
                                .with_level(self.level)
                                .build())

        self.action_factory = (ActionFactoryBuilder()
                                    .with_inventory_factory()
                                    .build())

    def test_dropped_item_is_removed_from_inventory(self):
        """
        Test that dropped item is removed from inventory
        """
        self.character.drop_item(self.item,
                                 self.action_factory)

        assert_that(self.item,
                    is_not(is_in(self.character.inventory)))

    def test_dropped_item_is_added_on_level(self):
        """
        Test that dropped item ends up on level
        """
        self.character.drop_item(self.item,
                                 self.action_factory)

        assert_that(self.item.level,
                    is_(equal_to(self.level)))

This is my preferred way of doing test driven development. Have one high level test to define your goal, that can be used to communicate with business owners. Have many low level tests that define technical implementation and guide you from start to finish in small steps. When something fails in the system later, we probably should end up with two errors: first one tells that dropping items is not possible anymore, second one tells that dingusX is broken and returns 1 instead of 2. Again, first one can be used to show general idea about the problem (it’s easier to talk with domain terms than with technical terms), while second one is tool for developers to pinpoint faul point.