I spoke about the experience at my workplace while moving from a 2+ TB relational database to a MongoDB cluster (twelve shards). My hope was to convey some of the challenges we encountered and the lessons we learned while working on this project.
You’ve probably been involved in the following kind of situation. I know I have.
It’s late in the day and two people are arguing in the CIO’s office.
Developer: “I’m just trying to do my job! I need access to the production database so I can troubleshoot this bug.”
DBA: “You shouldn’t need access to production databases to do your job. You can troubleshoot this just as well in our non-prod database.”
Developer: “No, the data condition that is causing the problem isn’t available in any database outside of production.”
There are two viewpoints at play here: application building versus data guardianship.
To a good DBA, data itself is a “feature”, which needs integrity, consistency, and cleanliness. DBAs know that lots of groups within an organization may want to access the data that has been written, or to extract reports from it. They also know that a given application may fade away into the sunset, but that the data itself will live forever in some form, probably to be used by the application that replaces the Visual Basic dinosaur. The DBA in our scenario is most likely worried that the developer will somehow corrupt what would otherwise be sound data.
Most DBAs want to be proactive in protecting data, but they are often put in the position of being reactive.
To a good application developer, data is what the application works with to provide functionality. It’s more of a by-product than an end in itself. Capturing, presenting, manipulating data are all done by applications that are the soul of what a good developer creates.
I wonder how many DBAs or data administrators take the time to explain the “stewardship” mentality they have around databases.
One of the biggest challenges to communication between developers and DBAs is the “object-relational impedence mismatch”. The overwhelmingly most popular method for software development is object-orientation. Relational databases are much more friendly to the procedural world.
How can we resolve this clash? Each side needs to understand where the other is coming from. For developers, this means understanding that DBAs view data as an asset to be protected. For DBAs, this means understanding that developers view data as a resource for their applications.
Since I’m coming at this from a database-centric view, I have three specific suggestions for DBAs and one for developers.
First, developers should learn more about relational theory. It’s not enough to understand how various database engines implement the theory, or to know how to write SQL. Developers should take some time to understand that relational databases are built on a mathematical foundation dealing with issues of data consistency and completeness.
Second, DBAs should learn an object-oriented language. I’m not suggesting that the typical DBA should become an application developer, but that they can learn the object-oriented mindset by writing programs in an OO language. Pick one, any one, Pythyon, Java, Ruby, it doesn’t matter for this purpose.
Third, DBAs should understand object-relational mapping and its tools. A DBA should at least understand the concepts behind a framework like Hibernate. This not only furthers the goal of getting the OO mindset, but helps DBAs understand the kind of code that these frameworks generate. A DBA can now become a collaborator with developers, helping to tweak the generated SQL so that it performs.
Fourth, and most important, DBAs should educate, as well as administer. Get out of the “ivory tower” of the computer room mindset. Teach developers some basic relational theory. Help them understand why you view the relationships between data elements as so important.