Friday, July 14, 2006

Some business database design rules

A short disclaimer: These rules are coming from my own experience. The list is not complete, and, actually, it's not a list - rather a reminder for myself and an advice for all others. Most of the rules might seem obvious, but I have spent several nightmarish days recently fixing the consequences of not following those rules.

1. Every item of business data which enters the database should have the information about the source of the data and the time when the data was inserted. The source should be as atomic as possible (not "one of mail servers", but "server"; not "a user input", but "Entered by user Joe Schmoe"). Of course, this does not have to be a text field - a source ID will do.

2. All changes to business-impacting data items should be recorded in corresponding history tables, together with the information about the time and the source of the changes (ID of the user who performed editing, name of the process that changed the data etc.). A note - this rule doesn't state that changes of all data items should be archived - just the ones that might be important for the business.

3. All changes to the database that are not done according to standard operation procedures should be documented. If there is some problem with the data, and there is no other solution but to fix it manually, the script used to perform the fix (even if it is a single SQL statement!), together with the description of the change and the time the change was performed, should be stored in some archive.

4. For each and every table in the database there should be a documentation describing how the data in this table can be changed, and what are the procedures for fixing the incorrect data. And, I will add, the designer of the database should take into consideration the fact that sometimes some data in each table might need to be fixed. For some tables, the documentation would be very simple: "Use UI to modify data". Some tables might require more complex procedures -for example, an error in financial transaction requires creation of a reverse transaction (and transaction types should include reverse transaction in this case!). Another example would be a table which somehow aggregates data from other tables - in this case incorrect data in this table might require changing data in some other table and, for example, running some update script.

Following these rules is not always easy - but in the long run it will save you from some extremely unpleasant experiences. Trust me - I've been through that!

Technorati tags: ,


Dmitriy Kropivnitskiy said...

Yes, in the ideal world these rules would be the essence of wisdom. Unfortunately the world is not ideal. Here is an alternative piece of lore about business databases (the guy is actually talking about data warehousing, but I think most of what he is saying is relevant to your case).

Aleksey Linetskiy said...

Thanks for the link. But I really don't see why these rules require an ideal world? It's tough to refactor existing system after these rules - that's true. But at the design time rules #1 and #2 are quite easy to implement. I don't see what's so difficult about rule #3 - in fact, I'm following it very diligently all the time. And rule #4 is more difficult - just because it deals with documentation, and, as we know, programmers hate writing documents.

Dmitriy Kropivnitskiy said...

well, I don't know what sort of definition of "business database" are you operating with, but in my experience number one is in a lot of cases useless, unless certain data entry standards are adhered to in the first place (like we actually enter the data into the database when the data arrives) otherwise it might actually cause more confusion than if it wasn't there in the first place. Number two leads to problems if you run into scalability issues and have to redesign (you start messing with history tables trying to sort uniform history entries into the new set of tables). Number three is fine, but now you have separated documentation and the database and unless you are very rigorous about your docs (and not many people are) at some point the docs and the database start going out of sync. But I agree that this is a sound advice. Number four is good, but again it relies on documentation and actual database to be in sync with each other. If the database allows for it, it might be better to create stored procedures for changing and fixing the data. And in case you need to do something non-standard, number three is your friend.

Aleksey Linetskiy said...

By "business database" I basically mean the db that stores business data :)

I didn't quite get your objections on #1. Of course, in many situations it might be useless - but in some it can just save the day. Example from my own experience: a bug on one of our servers was causing certain orders to be stored incorrectly. When we discovered it, we had to find the affected orders and fix them. If we had the information specified in #1, we could do it in a couple of hours. Without it, we spent almost a week. And I really can't imagine the situation where it can confuse things (Maybe you can give me an example?).

With #2 - well, any redesign of the database is a huge pain, with or without history tables. There are different possible solutions for the problem you mention (keeping old history data in a separate table can be one of them), but I really don't see why history tables are different in this scenarion from any other related tables? And the history tables are useful, oh yes! Another example from my experience: one of account managers changed certain mapping in the middle of the month. It was dicovered much later. Fortunately, we had history tables, so we knew exactly when the changed took place and were able to quickly fix the problem.

With #3, I think you've got me wrong. What I meant: some non-standard operation on DB. For example: (It's not a real example, but close enough to a real life) Iamgine that, all of a sudden, a new regulation is issued by government, which says that a credit card data cannot be stored in a DB longer than 36 hours.So, you need to go and clean CC fields in all orders that are older tnah 36 hours. You quickly write a SQL statement that will do the job. Here comes #3: before runing the statement, you should write a document, describing what the statement will do and why and save the statement together with the document in some archiving system (actually, we had a much more complex process for this...).

With #4 - I agree with what you are saying. Keeping docs in sync with the system is hard, but it has to be done. Stored procedures are great - but they also have to be kept in sync with the rest of the database.

Anonymous said...

With #4 - I agree with what you are saying. Keeping docs in sync with the system is hard, but it has to be done. Stored procedures are great - but they also have to be kept in sync with the rest of the database.

What do you mean by "Stored procedures are great - but they also have to be kept in sync with the rest of the database" ?
if you changed something in your database that reflects in your Stored procedures

Anonymous said...

Thanks your bloghelped me alot for my research ;)

Aaron said...

These are some great tips, but what do you think about the importance of a good comment system on the business's' site? I think when of the most important things is to make sure you can easily contact your clients !