I spoke about the experience at my workplace while moving from a 2+ TB relational database to a MongoDB cluster (twelve shards). My hope was to convey some of the challenges we encountered and the lessons we learned while working on this project.
Have you ever had to work with a legacy relational database and found that there was little, if any documentation about the existing tables and columns? In my career, this has been a common problem for me, especially when coming up-to-speed on an existing application’s database. It’s bad enough when there is no ER diagram to give you an overview of the database, but many database developers do everything in their power, it seems, to make the database itself opaque.
What are some best practices for making your database easier to understand?
Make Your Table and Column Names Long Enough to be Descriptive
Instead of something like CSTACCT, use a name like CUSTOMER_ACCOUNT. From a self-documenting perspective, there is very little point to using short, cryptic names, when you may have 30 characters (Oracle) or as many as 63 characters (PostgreSQL) to work with.
Establish Abbreviation Standards
When you must abbreviate names, use consistent abbreviations, and maintain a list of approved abbreviations that all database designers and developers may use. Enforce the standard abbreviations through code reviews.
Be Consistent with Table and Column Naming
Another practice that makes for difficult-to-understand databases is the use of inconsistent naming conventions. Will your “flag” columns be named “completed_yn”, “completed_flag”, or “completed_ind”? It probably doesn’t matter which convention you adopt, as long as you use it consistently throughout your database.
Use Table and Column Comments
Most RDBMs today allow you to store comments right in the database itself. This makes it easy to generate a data dictionary. Adopt a coding standard for your data definition SQL that requires table and column comments (sometimes called “remarks”).
Create a Data Dictionary
Whenever you’re responsible for database design or maintenance, publish a data dictionary. There are expensive tools like ErWin that make this easy, but I like to use the free and open-source SchemaSpy. It’s critical that you keep the data dictionary up-to-date, so be sure to publish a revised copy whenever there are revisions to your database schema.
What Do You Gain?
By adopting the practices I’ve described, you’ll gain the powerful ability to help others understand your databases and the data itself. In summary:
- New developers can more easily understand where data is stored in your databases.
- Other teams, outside the software engineering department, can find the data they’re interested in, whether for ad hoc queries or for regularly-scheduled reports.
- The ETL logic for migrating data, when moving to a new system, is much easier to develop, if all existing data is clearly described.
- Along the same lines of data integration, it will be straightforward for ETL developers to identify source and target database elements.