Archive

Posts Tagged ‘NoSQL’

MongoDB for DBAs Course Impressions

April 21, 2013 2 comments

I recently received my final score from 10gen Education’s MongoDB for DBA course.  I’m pleased to report that I scored in the top tier.  10gen even said that I am “awesome”.

I wanted to give my impressions of the course and to encourage as many folks who are interested to take the free training that 10gen is providing.   The next DBA course begins on April 29, and the MongoDB for Developers course (for Java developers) starts on May 13.

Course Format

There are six weeks of lectures, and a seventh week for a final exam, which is really a hands-on project.

Each week is divided up into six to twelve video “lectures”, of varying length.  Some of the lectures are only three or four minutes long, though some are as many as fifteen to twenty minutes.  Most of the video lectures are followed by one or two quizzes. These quizzes are almost all multiple-choice questions.  You’re given up to three chances to get a quiz answer correct, and can even peek at the answer before submitting your solution.  While this may seem like a way to cheat, you’re really cheating yourself if you make no attempt to answer the quiz questions honestly. If you don’t understand the material in the lectures, you will not be able to complete the homework. Think of the quizzes as a way of checking your understanding of the video lectures, and re-watch any lectures where you found the quiz difficult.

There are typically four or five homework problems per week.  These are primarily not multiple-choice, but worked problems that require you to actually perform various operations with MongoDB.  If you haven’t mastered the material in the lectures, you will not complete the homework successfully, as the problems are not trivial.

The Less Good

Like most MOOCs, this is no substitute for an instructor-led class.  You can’t ask questions of the instructor in real-time, but only through the course message board.

The quizzes are somewhat simplistic.  This may be somewhat attributable to the test engine that 10gen used (edX).  It would be helpful to students if the questions weren’t multiple choice, and required a bit more understanding.

I found myself referring to the excellent on-line documentation for MongoDB to fill in gaps that were left by the lectures themselves.  Many weeks I found myself wanting more information than the lectures provided.

The Good

It’s free!  10gen is very wise to offer this training free to the community.  The more folks who know how to use MongoDB, the better it is for MongoDB and 10gen.

The DBA course is taught by one of the founders of 10gen, Dwight Merriman.  To have someone at that level spending precious time on instruction tells me that 10gen clearly values building up its user base and community.

The homework assignments really test your understanding of the material.  10gen was very ingenuous in making it difficult to cheat on the homework questions.  I think the homework is the best part of the course, actually. I’ve referred back to homework questions several times to help me solve a problem at work. The course would be even stronger if similar effort had been put into the quizzes.

I’m really grateful to 10gen for making this training freely available.  I was so impressed by my experience with MongoDB for DBAs that I’ve registered for the MongoDB for Java Developers course (M101J) , which begins on May 13.  Can’t wait!

Advertisements
Categories: MongoDB, NoSQL Tags: , ,

Brief Impressions of Mongo Boston 2011

October 6, 2011 Leave a comment

Monday I attended Mongo Boston 2011 at the Microsoft NERD Center in Cambridge.

The opening keynote by 10gen’s CTO and co-founder Eliot Horowitz struck a couple of very interesting notes.

  • 10gen wants MongoDB to be a general-purpose database.
  • One of their key principles in building MongoDB is to reduce the number of “knobs” an administrator needs to turn.

Overall I would say the conference was valuable, but could really do with a second day.  For one thing, none of the presentations were more than forty-five minutes long.  While that length does allow for decent overviews, it’s impossible to get into any real depth with such a limited time.

A second day could also reduce some of the “drinking from a fire hose” effect.  I attended eight different presentations, which contained a lot of concepts to absorb.

I wouldn’t recommend these conferences for those who have no experience at all using MongoDB.  I’ve worked with it for a little over a year now, so the material was at a good level for my current understanding.

The price was right at $20 or $30 depending on whether you met the early bird deadline or not.  In my mind, this pricing is a shrewd strategy by 10gen, as it enables interested students to attend.  Building interest and enthusiasm among the up-and-coming developers of tomorrow is a great way to build a community.  However, it was gratifying to see that the attendees represented a wide range of ages.

If you get an opportunity to attend one of the upcoming conferences, I think you’ll find the day worth your time.

Categories: MongoDB, NoSQL Tags: ,

Book Review: “HBase: The Definitive Guide” by Lars George (O’Reilly Media)

October 5, 2011 Leave a comment

Summary

(Disclosure: O’Reilly Media provided me with a free ebook copy of this book for the purposes of this review. I have done my best not to let that influence my opinions here.)

When a book bills itself as “The Definitive Guide,” well, that’s a tall order to fill. But, except for updates as new releases of HBase roll out, I can’t imagine another book surpassing this one by Lars George.

Lars George has been working with HBase since 2007 and is a full committer to the project as of 2009.  He now works for Cloudera (a company providing a commercial flavor of Hadoop, as well as Hadoop support).  After reading this book, there’s no question in my mind that George has deep understanding, not only of HBase as a data solution, but of the internal workings of HBase.

My Reactions

George gives the background and history of HBase in the larger context of relational databases and NoSQL, which I found to be very helpful. The many diagrams throughout the book are extremely useful in explaining concepts, especially for those of us coming from a relational database background.

George has an excellent and clear writing style. Take, for example, the section where he discusses The Problem with Relational Database Systems, giving a quick rundown of the typical steps for getting an RDBMS to scale up.  The flow of his summary reads like the increasing levels of panic that many of us have gone through when dealing with a database-backed application that will not scale.

As an example of how thorough and comprehensive the book is, look at chapter 2, where there is an extensive discussion of the type and class (not desktop PCs!) of machines suitable for running HBase. George gives a truly helpful set of configuration practices, even down to a recommendation for having redundant power supply units.

Another example of his thoroughness comes where George discusses delete methods (Chapter 3). He shows how you can use custom versioning, while admitting that the example is somewhat contrived. Indeed, right after elaborating the example, there is a distinct “Warning” box that admits that custom versioning is not actually recommended.  So, even though you may not implement custom versioning, you do understand it as a feature that HBase provides.

Many of the programming examples come with excellent remarks or discussions of the tradeoffs implicit in the techniques, including performance and scaling concerns.  Java developers will be most comfortable with the majority of examples, but they can be followed by anyone with some object-oriented programming experience.

I really appreciated the thorough discussion in chapter 8 (“Architecture”) of subjects like B+ trees vs. Log-Structured Merge Trees (LSMs), the Write-Ahead Log, and seeks vs. transfers, topics which are relevant not only to HBase but to many database systems of varying architectures.

The level of thoroughness is also the book’s only weakness.  I’m not sure who the target audience for this book is, because it serves both developers and system or database administrators.  While nearly every imaginable HBase topic is touched upon, some would have been better off merely listed, with appropriate references given to sources of more information (for example, all those hardware recommendations). The print edition of the book is 552 pages.

Still, a complaint that a book is too detailed shouldn’t be interpreted as much of a complaint.  Anyone with an interest in NoSQL databases in general, and HBase in particular should read and study this book.  It’s not likely to be superseded in the future.

The catalog page for “HBase: The Definitive Guide”.

Categories: Book Reviews, Books, NoSQL Tags: ,

Book Review: “Big Data Glossary” by Pete Warden (O’Reilly Media)

October 2, 2011 Leave a comment

Big Data Glossary” could probably have been titled  something like “Big Data Cheat Sheets” because it’s both more and less than a glossary.  Instead the book is an excellent summary of tools in the “big data” space, rather than a list of terms with definitions.

Warden tackles eleven topics:

  1. Some background on fundamental techniques (e.g., key-value stores)
  2. NoSQL databases
  3. MapReduce
  4. Storage techniques
  5. “Cloud” servers
  6. Data processing technologies (e.g., R and Lucene)
  7. Natural Language Processing
  8. Machine Learning
  9. Visualization
  10. Acquisition
  11. Serialization

He covers none of these topics in great detail, which will no doubt cause carping among some folks.  However, I really like his approach of sketching broad themes, identifying key projects (or products) in each space, and pointing the reader to further research.  Because the field of “big data” is so large, this short book (it’s only 50 pages) serves the extremely useful purpose of tying together the field by providing an overview.

Highly recommended for folks looking to get their feet wet in the great lake of big data.