Unit testing database code

These are my notes on how I've gone about unit testing database functionality. The examples are in Java, but I think the ideas are applicable to a variety of programming environments. I'm looking for better solutions.

The problem is this: you have a SQL database, some stored procedures, and a layer of code sitting between your application and the database. How can you put tests in place to make sure your code really is reading and writing the right data from the database?

 

Original: April/May 2001. This version: $Revision: 1.10 $ $Date: 2006/08/14 03:37:08 $

Updated:


Why bother?

I'm guessing some, if not a lot, of database development goes like this: set up database, write code to access database, run code, do a SELECT to see if the records showed up in the database. They did? Good, then we're done.

The problem with visual inspection is this: you don't do it often, and you don't check everything every time. It's possible that when you make changes to a system, maybe months later, you break something and some data will go missing. As a coder you may not spend much time checking the data itself, so it may take a while for this mistake to surface. I've worked on a web project where a mandatory field on a registration form was not being inserted into a database for the best part of a year. Although marketing had protested that they needed this information, the problem wasn't spotted because the data was never ever looked at it (but don't get me started on that).

Automated tests — painless tests that run often and test lots — reduce the chances of your data is going missing. I find they make it easier for me to sleep at night. (Tests have other positive features: they're good examples of how to use code, they act as documentation, they make other people's code less scary when you need to change it, they reduce debugging time).

What kinds of tests are we talking about?

Consider a simple user database, with a email address and a flag indicating if mail to the address has bounced or not. Your database layer might consist of methods for insert, update, delete, and find.

The insert method would call a stored procedure to write the address and field to the database. With much simplification and omission the code might look like this:

public class UserDatabase
{
  ...
  public void insert(User user)
  {
    PreparedStatement ps = connection.prepareCall("{ call User_insert(?,?) }");
    ps.setString(1, user.getEmail());
    ps.setString(2, user.isBad());  // In real life, this would be a boolean.
    ps.executeUpdate();
    ps.close();
  }
  ...
}

The kind of testing code I'm thinking of would look something like this:

public class TestUserDatabase extends TestCase
{
  ...
  public void testInsert()
  {
    // Insert a test user:
    User user = new User("some@email.address");
    UserDatabase database = new UserDatabase();
    database.insert(user);

    // Make sure the data really got there:
    User db_user = database.find("some@email.address");
    assertTrue("Expected non-null result", db_user != null);
    assertEquals("Wrong email", "some@email.address", db_user.getEmail());
    assertEquals("Wrong bad flag", false, db_user.isBad());
  }
  ...
}
... only you'd have more tests, probably. (And take care with some tests, like tests on dates).

The assertTrue and assertEquals methods test that a condition is true, and if not the test fails in some way giving a diagnostic message. The idea is that the test is automatically run via a test framework, and a clear indication of success or failure is flagged. This is based on JUnit (see resources, below), a testing framework for Java. The framework is available for other languages, including C, C++, Perl, Python, .NET (all languages), PL/SQL, Eiffel, Delphi, VB... (see resources, below).

The next question becomes: we have tests, but how do we manage the testing data in the database so that it doesn't "mess up" live data?

Approaches that don't work

Before I start, I should point out that I expect you to have a development database. You wouldn't want to do anything I've noted in here on a production database.

The first approach I tried was to manually insert some testing data in a copy of the production database. These would be records with known values, such as "testuser01@test.testing". If you were testing some searching functionality, you'd know that there were, say, five users in the database "@test.testing".

For inserted test records, as in the example above, the test itself would have to maintain the state of the database. I.e., the test would have to be sure to clean up after itself, be careful not to deleted required records, so the database was in a good state once the test had finished.

This approach troubles me for the following reasons:

Some fixes I've tried: add a "is_test" field to records, as a way to flag test records. This avoids the "magic values" problem. The down side is that your test code needs to operate only on records where the is_test field is set, whereas your production code needs to work with records where is_test is false. If you have differences at that level, you're not really testing the same code.

You need four databases

Some thoughts: A good test set is self sufficient and creates all the data it needs. Testing can be simplified if you can get the database in a known state before a test is run. One way to do this is to have a separate unit test database which is under the control of the test cases: the test cases clean out the database before starting any tests.

In code, you can do this by having a dbSetUp method which might look like this:

  ...
  public void dbSetUp()
  {
    // Put the database in a known state:
    // (stored procedures would probably be better here)
    helper.exec("DELETE FROM SomeSideTable");
    helper.exec("DELETE FROM User");

    // Insert some commonly-used test cases:
    ...
  }

Any database test would call dbSetUp() before anything else, which would put the database in a known state (mostly empty). This gives you the following advantages:

The down-side is that you need more than one database — but remember, they can all run on one server if necessary. The way I'm testing now needs four databases (well, two at a pinch):

  1. The production database. Live data. No testing on this database.
  2. Your local development database, which is where most of the testing is carried out.
  3. A populated development database, possibly shared by all developers so you can run your application and see it work with realistic amounts of data, rather than the hand full of records you have in your test database. You may not strictly need this, but it's reassuring to see your app work with lots of data (i.e., a copy of the production database's data).
  4. A deployment database, or integration database, where the tests are run prior to deployment to make sure any local database changes have been applied. If you're working alone, you may be able to live without this one, but you'll have to be sure any database structure or stored procedure changes have been made to the production database before you go live with your code.

With multiple database you have to make sure you keep the structure of the databases in sync: if you change a table definition or a stored procedure on your test machine, you'll have to remember to make those changes on the live server. The deployment database should act as a reminder to make those changes. Also I find source control systems help here if the commit comments are emailed to all developers automatically. CVS (see resources, below) does this, and I expect others do too.

Test against the right database

In this environment you have to be sure you're connecting to the right database. Running the test set against a production database will delete all your data. This scares the hell out of me.

There are ways to protect against this. For example, it's not uncommon to have database connection settings stored in a initialization file and you can use this to state which database is the test database. You might use one initialization file for testing (pointing to a local database), and another for production work (pointing to a live database).

In Java, an initialization file might look like this:

  myapp.db.url=jdbc:mysql://127.0.0.1/mydatabase
This is the connection string for connecting to a database. You can also add a second connection string to identify the test database:
  myapp.db.url=jdbc:mysql://127.0.0.1/mydatabase
  myapp.db.testurl=jdbc:mysql://127.0.0.1/my_test_database
In test code you can add a check to make sure you'll only run when you're connecting to a test database:
  public void dbSetUp()
  {
     String test_db = InitProperties.get("myapp.db.testurl");
     String db = InitProperties.get("myapp.db.url");

     if (test_db == null)
       abort("No test database configured");

     if (test_db.equals(db))
     {
        // All is well: the database we're connecting to is the
        // same as the database identified as "for testing"
     }
     else
     {
        abort("Will not run tests against a non-test database");
     }
  }
Another trick: if you have a local testing database, the tests can check the IP or hostname you've been asked to run against. If it's not localhost / 127.0.0.1 there's a risk you're running against a live database.

Conclusions

In these notes I've tried to say:

There are other ways to approach this problem. I'm not yet confident enough to trust myself with mock objects (see resources, below). As I understand mock objects, you simulate a layer of a system (in this case, the RDBMS), so that your mock database always returns just what you expect. I like the sounds of that: it encourages you to layer your testing, perhaps by having a SQL-level set of tests, and then having a Java-level set which work on mock ResultSet objects.

My concern is only that some actions can lead to changes in a two or more tables, and at this point the mock objects/simulation may become a pain to maintain and implement. And of course, I'll need to find a good way to test the SQL-level of the database: remember, I want to be sure data really is making it into the database properly.

A view from 2006

The above article was written in 2001. Five years later I'm surprised that I still find the spirit of the article useful today. Of course, many of the details are different.

The main thing that has changed for me is the availability of strong persistence frameworks. These remove a good chunk of risk with respect to the SQL that is executed against a database. As a consequence, testing becomes easier.

The unit tests I write today do still do things such as confirm the right set of results are returned in the right order, but the management of these tests has become easier. In particular as persistence frameworks are abstracted form any particular RDBMS, it becomes feasible to substitute alternatives for unit testing. \ There's a great article by Alex Vollmer (see resources, below) that shows how to use an in-memory database with Hibernate.

I still feel it's important to test that data goes through persistence as it is expected it to. I recently had a situation where a test was failing because a persistence annotation was accidently removed from a class. The persistence layer was doing exactly what it was told to do, but the result was an error in the application. Catching those situations is what makes unit testing addictive.

Having said that, there's not a great benefit in writing tests that effectively just test the framework you're using: you have to trust the underlying technologies to do what they say they do. There's a fine line there, and the rule of thumb I follow is that if I've written or touched the code, it needs to be tested.

I can report that, five years on, having tests that go through the persistence layer can give you enormous confidence in the code you're writing.

Comments

I'm always looking for better ways to test database code. If you have any ideas, or any comments, please do email me.

A note on testing dates

If you're storing date information you probably want to make sure that you're storing the right dates. Be aware of some issues.

First, ask yourself who is creating the date? If it's your code, that's fine because you should be able to compare the date you created to the date you get back when you go looking into the database. If the database is creating the date, perhaps as a default column value, then you may have some problems. For example, are you sure the timezone for your code is the same as the timezone for the database? It's not unheard of to have databases running in GMT/UTC. Are you sure the clock on the machine you're running on will be set to the same time as the the clock on the database? If not, you're going to have some margin of error when comparing times.

If you run into these situations there are a few things you can do:

Resources

last modified Saturday, 06-Apr-2013 17:26:33 BST | Contact