Archive

Posts Tagged ‘Database’

Make Your Relational Database Self-Documenting

August 24, 2013 Leave a comment

Have you ever had to work with a legacy relational database and found that there was little, if any documentation about the existing tables and columns?  In my career, this has been a common problem for me, especially when coming up-to-speed on an existing application’s database.  It’s bad enough when there is no ER diagram to give you an overview of the database, but many database developers do everything in their power, it seems, to make the database itself opaque.

What are some best practices for making your database easier to understand?

Make Your Table and Column Names Long Enough to be Descriptive

Instead of something like CSTACCT, use a name like CUSTOMER_ACCOUNT.  From a self-documenting perspective, there is very little point to using short, cryptic names, when you may have 30 characters (Oracle) or as many as 63 characters (PostgreSQL) to work with.

Establish Abbreviation Standards

When you must abbreviate names, use consistent abbreviations, and maintain a list of approved abbreviations that all database designers and developers may use.  Enforce the standard abbreviations through code reviews.

Be Consistent with Table and Column Naming

Another practice that makes for difficult-to-understand databases is the use of inconsistent naming conventions.  Will your “flag” columns be named “completed_yn”, “completed_flag”, or “completed_ind”? It probably doesn’t matter which convention you adopt, as long as you use it consistently throughout your database.

Use Table and Column Comments

Most RDBMs today allow you to store comments right in the database itself.  This makes it easy to generate a data dictionary.  Adopt a coding standard for your data definition SQL that requires table and column comments (sometimes called “remarks”).

Create a Data Dictionary

Whenever you’re responsible for database design or maintenance, publish a data dictionary.  There are expensive tools like ErWin that make this easy, but I like to use the free and open-source SchemaSpy.  It’s critical that you keep the data dictionary up-to-date, so be sure to publish a revised copy whenever there are revisions to your database schema.

What Do You Gain?

By adopting the practices I’ve described, you’ll gain the powerful ability to help others understand your databases and the data itself.  In summary:

  • New developers can more easily understand where data is stored in your databases.
  • Other teams, outside the software engineering department, can find the data they’re interested in, whether for ad hoc queries or for regularly-scheduled reports.
  • The ETL logic for migrating data, when moving to a new system, is much easier to develop, if all existing data is clearly described.
  • Along the same lines of data integration, it will be straightforward for ETL developers to identify source and target database elements.
Advertisements

Fundamental Constructs

September 13, 2011 Leave a comment

There are a number of very fundamental concepts that crop up again and again in computer science.  This is particularly true when it comes to different database products.  Among these constructs are:

  • B-trees (and variations)
  • Caches
  • Locking and scheduling
  • Hashing
  • Memory management

All “big-time” relational databases, and many non-relational ones, deal with these issues.  Vendors or projects may implement locking in slightly different ways, but you can be pretty sure that a relational database deals with locking.  Having a strong understanding, for example, of locking techniques and how they interact will help you go a long way toward working on almost any relational database.

I’ll be adding posts on these topics in the future, but in the meantime, if you were to do nothing more than search Wikipedia for these topics, and read the articles and references contained in them, you would learn a lot about the internals of databases.