DB Testing

These are my notes on how I've gone about unit testing database functionality. The examples are in Java, but I think the ideas are applicable to a variety of programming environments. I'm looking for better solutions.

The problem is this: you have a SQL database, some stored procedures, and a layer of code sitting between your application and the database. How can you put tests in place to make sure your code really is reading and writing the right data from the database?

Original: April/May 2001. This version: $Revision: 1.10 $ $Date: 2006/08/14 03:37:08 $

Why bother?

I'm guessing some, if not a lot, of database development goes like this: set up database, write code to access database, run code, do a SELECT to see if the records showed up in the database. They did? Good, then we're done.

The problem with visual inspection is this: you don't do it often, and you don't check everything every time. It's possible that when you make changes to a system, maybe months later, you break something and some data will go missing. As a coder you may not spend much time checking the data itself, so it may take a while for this mistake to surface. I've worked on a web project where a mandatory field on a registration form was not being inserted into a database for the best part of a year. Although marketing had protested that they needed this information, the problem wasn't spotted because the data was never ever looked at it (but don't get me started on that).

Automated tests — painless tests that run often and test lots — reduce the chances of your data is going missing. I find they make it easier for me to sleep at night. (Tests have other positive features: they're good examples of how to use code, they act as documentation, they make other people's code less scary when you need to change it, they reduce debugging time).

What kinds of tests are we talking about?

Consider a simple user database, with a email address and a flag indicating if mail to the address has bounced or not. Your database layer might consist of methods for insert, update, delete, and find.

The insert method would call a stored procedure to write the address and field to the database. With much simplification and omission the code might look like this:

The assertTrue and assertEquals methods test that a condition is true, and if not the test fails in some way giving a diagnostic message. The idea is that the test is automatically run via a test framework, and a clear indication of success or failure is flagged. This is based on JUnit (see resources, below), a testing framework for Java. The framework is available for other languages, including C, C++, Perl, Python, .NET (all languages), PL/SQL, Eiffel, Delphi, VB... (see resources, below).

The next question becomes: we have tests, but how do we manage the testing data in the database so that it doesn't "mess up" live data?

Approaches that don't work

Before I start, I should point out that I expect you to have a development database. You wouldn't want to do anything I've noted in here on a production database.

The first approach I tried was to manually insert some testing data in a copy of the production database. These would be records with known values, such as "testuser01@test.testing". If you were testing some searching functionality, you'd know that there were, say, five users in the database "@test.testing".

For inserted test records, as in the example above, the test itself would have to maintain the state of the database. I.e., the test would have to be sure to clean up after itself, be careful not to deleted required records, so the database was in a good state once the test had finished.

Some fixes I've tried: add a "is_test" field to records, as a way to flag test records. This avoids the "magic values" problem. The down side is that your test code needs to operate only on records where the is_test field is set, whereas your production code needs to work with records where is_test is false. If you have differences at that level, you're not really testing the same code.

You need four databases

Some thoughts: A good test set is self sufficient and creates all the data it needs. Testing can be simplified if you can get the database in a known state before a test is run. One way to do this is to have a separate unit test database which is under the control of the test cases: the test cases clean out the database before starting any tests.

In code, you can do this by having a dbSetUp method which might look like this:

Any database test would call dbSetUp() before anything else, which would put the database in a known state (mostly empty). This gives you the following advantages:

The down-side is that you need more than one database — but remember, they can all run on one server if necessary. The way I'm testing now needs four databases (well, two at a pinch):

With multiple database you have to make sure you keep the structure of the databases in sync: if you change a table definition or a stored procedure on your test machine, you'll have to remember to make those changes on the live server. The deployment database should act as a reminder to make those changes. Also I find source control systems help here if the commit comments are emailed to all developers automatically. CVS (see resources, below) does this, and I expect others do too.

Test against the right database

In this environment you have to be sure you're connecting to the right database. Running the test set against a production database will delete all your data. This scares the hell out of me.

There are ways to protect against this. For example, it's not uncommon to have database connection settings stored in a initialization file and you can use this to state which database is the test database. You might use one initialization file for testing (pointing to a local database), and another for production work (pointing to a live database).

Conclusions

There are other ways to approach this problem. I'm not yet confident enough to trust myself with mock objects (see resources, below). As I understand mock objects, you simulate a layer of a system (in this case, the RDBMS), so that your mock database always returns just what you expect. I like the sounds of that: it encourages you to layer your testing, perhaps by having a SQL-level set of tests, and then having a Java-level set which work on mock ResultSet objects.

My concern is only that some actions can lead to changes in a two or more tables, and at this point the mock objects/simulation may become a pain to maintain and implement. And of course, I'll need to find a good way to test the SQL-level of the database: remember, I want to be sure data really is making it into the database properly.

A view from 2006

The above article was written in 2001. Five years later I'm surprised that I still find the spirit of the article useful today. Of course, many of the details are different.

The main thing that has changed for me is the availability of strong persistence frameworks. These remove a good chunk of risk with respect to the SQL that is executed against a database. As a consequence, testing becomes easier.

The unit tests I write today do still do things such as confirm the right set of results are returned in the right order, but the management of these tests has become easier. In particular as persistence frameworks are abstracted form any particular RDBMS, it becomes feasible to substitute alternatives for unit testing. \ There's a great article by Alex Vollmer (see resources, below) that shows how to use an in-memory database with Hibernate.

I still feel it's important to test that data goes through persistence as it is expected it to. I recently had a situation where a test was failing because a persistence annotation was accidently removed from a class. The persistence layer was doing exactly what it was told to do, but the result was an error in the application. Catching those situations is what makes unit testing addictive.

Having said that, there's not a great benefit in writing tests that effectively just test the framework you're using: you have to trust the underlying technologies to do what they say they do. There's a fine line there, and the rule of thumb I follow is that if I've written or touched the code, it needs to be tested.

I can report that, five years on, having tests that go through the persistence layer can give you enormous confidence in the code you're writing.

Comments

I'm always looking for better ways to test database code. If you have any ideas, or any comments, please do email me.

A note on testing dates

If you're storing date information you probably want to make sure that you're storing the right dates. Be aware of some issues.

First, ask yourself who is creating the date? If it's your code, that's fine because you should be able to compare the date you created to the date you get back when you go looking into the database. If the database is creating the date, perhaps as a default column value, then you may have some problems. For example, are you sure the timezone for your code is the same as the timezone for the database? It's not unheard of to have databases running in GMT/UTC. Are you sure the clock on the machine you're running on will be set to the same time as the the clock on the database? If not, you're going to have some margin of error when comparing times.

Unit testing database code