Archive

Archive for the ‘Database’ Category

Slides from my MongoDB Boston 2013 Talk “No More SQL”

October 29, 2013 Leave a comment

MongoDB was kind enough to ask me to present at MongoDB Boston 2013 last Friday (2013-10-25).  Below are the slides from my talk, entitled “No More SQL”.

I spoke about the experience at my workplace while moving from a 2+ TB relational database to a MongoDB cluster (twelve shards).  My hope was to convey some of the challenges we encountered and the lessons we learned while working on this project.

Advertisements

Make Your Relational Database Self-Documenting

August 24, 2013 Leave a comment

Have you ever had to work with a legacy relational database and found that there was little, if any documentation about the existing tables and columns?  In my career, this has been a common problem for me, especially when coming up-to-speed on an existing application’s database.  It’s bad enough when there is no ER diagram to give you an overview of the database, but many database developers do everything in their power, it seems, to make the database itself opaque.

What are some best practices for making your database easier to understand?

Make Your Table and Column Names Long Enough to be Descriptive

Instead of something like CSTACCT, use a name like CUSTOMER_ACCOUNT.  From a self-documenting perspective, there is very little point to using short, cryptic names, when you may have 30 characters (Oracle) or as many as 63 characters (PostgreSQL) to work with.

Establish Abbreviation Standards

When you must abbreviate names, use consistent abbreviations, and maintain a list of approved abbreviations that all database designers and developers may use.  Enforce the standard abbreviations through code reviews.

Be Consistent with Table and Column Naming

Another practice that makes for difficult-to-understand databases is the use of inconsistent naming conventions.  Will your “flag” columns be named “completed_yn”, “completed_flag”, or “completed_ind”? It probably doesn’t matter which convention you adopt, as long as you use it consistently throughout your database.

Use Table and Column Comments

Most RDBMs today allow you to store comments right in the database itself.  This makes it easy to generate a data dictionary.  Adopt a coding standard for your data definition SQL that requires table and column comments (sometimes called “remarks”).

Create a Data Dictionary

Whenever you’re responsible for database design or maintenance, publish a data dictionary.  There are expensive tools like ErWin that make this easy, but I like to use the free and open-source SchemaSpy.  It’s critical that you keep the data dictionary up-to-date, so be sure to publish a revised copy whenever there are revisions to your database schema.

What Do You Gain?

By adopting the practices I’ve described, you’ll gain the powerful ability to help others understand your databases and the data itself.  In summary:

  • New developers can more easily understand where data is stored in your databases.
  • Other teams, outside the software engineering department, can find the data they’re interested in, whether for ad hoc queries or for regularly-scheduled reports.
  • The ETL logic for migrating data, when moving to a new system, is much easier to develop, if all existing data is clearly described.
  • Along the same lines of data integration, it will be straightforward for ETL developers to identify source and target database elements.

Compiling and Installing PostgreSQL 9.1 from Source on Fedora 15 (64-bit)

September 23, 2011 Leave a comment

Prerequisites

Last week I discussed various reasons why you would want to compile your open-source database from the project’s source code.  I wanted to give an example here of the process for compiling a particular database, PostgreSQL.  Version 9.1 of PostgreSQL was released on 2011-09-11, so the example below works with that code base.

I’m assuming in the following instructions that you’re starting with a sparkling, new Fedora installation.  In other words, you haven’t installed any of the dependencies (e.g., gcc, make, bison) that you’ll need for building the PostgreSQL source code.  I used Fedora 15, 64-bit.

To grab the most up-to-date Postgres code, you’ll want to check it out from the git repository.  If git is not installed on your system, install it.
$ sudo yum install git
There are a number of libraries and software programs that you’ll need if you’re going to build PostgresSQL.  The easiest way to get all of them is to use the yum-builddep command.  (This will also install flex, bison, and Perl, if they are not already present on your system).
$ sudo yum-builddep postgresql
Among other software and libraries, yum should offer to install bison, flex, and tcl.  However, yum-builddep for postgresql does not install the gcc compiler, which you will also need.  To get gcc installed, run the command:
$ sudo yum install gcc

Check out the Source Code using Git

Grabbing the code via git is a very simple one-line command.

$ git clone git://git.postgresql.org/git/postgresql.git
This checks the source code out into a sub-directory of your current location, called postgresql.

Configure the Build

A full list of the configuration options available at the command line is described in the “PostgreSQL 9.1.0 Documentation“.  For this example, I ran a fairly vanilla configuration, but added features for SQL/XML in the database.
$ cd postgresql
$ ./configure --with-libxml --with-libxslt

Make and Install

Now that you’ve fine-tuned exactly how you want Postgres built, you’ll run the fairly standard make and install steps.
$ gmake
Note that you must run the install step as a privileged user, since access to the /usr/local directory (it’s there, at /usr/local/pgsql/bin, that the binary files will live) is restricted by default.
$ sudo gmake install

Post-Installation and Database Configuration

You’ll want to run PostgreSQL as a specific, non-root user, postgres.  So, create this user.

$ sudo adduser postgres
$ sudo passwd postgres

Now, change user to the newborn postgres user to run the database initialization.

$ su - postgres

First, set the environment variables for locating needed libraries and executables.

$ LD_LIBRARY_PATH=/usr/local/pgsql/lib; export LD_LIBRARY_PATH
$ PATH=$PATH:/usr/local/pgsql/bin; export PATH

Now, create a directory to contain the server’s data files.

$ mkdir <directory_for_database_data_files>

Run the initdb command to initialize the PostgreSQL server.

$ initdb -D <directory_for_database_data_files>

I recommend creating a specific directory for the server log files.

$ mkdir <directory_for_server_log_files>

Starting the Server

Start the server as the postgres user, indicating the location of data files and the location of the server log files.

$ postgres -D <directory_for_data files> ><directory_for_server_log_files>/server.log 2>&1 &

There shouldn’t be any errors that prevent the server from starting, but if you inspect the log file you should see the following messages.

LOG: database system is ready to accept connections
LOG: autovacuum launcher started

Your server should be up and running, ready to accept your client connections!

Fundamental Constructs

September 13, 2011 Leave a comment

There are a number of very fundamental concepts that crop up again and again in computer science.  This is particularly true when it comes to different database products.  Among these constructs are:

  • B-trees (and variations)
  • Caches
  • Locking and scheduling
  • Hashing
  • Memory management

All “big-time” relational databases, and many non-relational ones, deal with these issues.  Vendors or projects may implement locking in slightly different ways, but you can be pretty sure that a relational database deals with locking.  Having a strong understanding, for example, of locking techniques and how they interact will help you go a long way toward working on almost any relational database.

I’ll be adding posts on these topics in the future, but in the meantime, if you were to do nothing more than search Wikipedia for these topics, and read the articles and references contained in them, you would learn a lot about the internals of databases.