Archive

Archive for the ‘Book Reviews’ Category

Book Review: MongoDB Applied Design Patterns

MongoDB Applied Design Patterns is a book that I will read again.  I generally don’t say that about technical books, but the strengths of this work are such that many parts merit a second reading.

This book is for folks with some experience using MongoDB.  If you’ve never worked with MongoDB before, you should start with another book.  Python developers, in particular, will benefit from studying this book, as most of the code examples are in that language.  As long as you have some object-oriented programming experience and have worked with the MongoDB shell, though, you’ll have little difficulty following the code examples.

Another group of people who will strongly benefit from this book are those with only relational database experience.  The author does a thorough job, particularly in the early sections of the book, of comparing MongoDB with traditional relational database management systems.

I particularly liked the author’s discussion of transactions, in chapter 3.  The example is complex, and not a simple debit-credit discussion.  You understand through this example that you must write your own transaction management when you give up using a relational database system.  To me, this is an important point, and I’m glad that the author spends so much time on this example.

Some of the use cases presented are similar to those in the MongoDB manual, in particular chapters four, five, and six.  The remaining use cases go beyond what is described in that manual. All of the discussion in these use cases is thorough.  There is typically an explanation of the data model (schema design) and then of standard CRUD operations.  The author also goes into not-so-typical operations, like aggregation.  I was particularly pleased that each use case includes sharding concerns.

In summary, I highly recommend this book.  It’s great to see MongoDB being adopted for so many different uses.

Advertisements
Categories: Book Reviews, MongoDB, NoSQL

Book Review: “HBase: The Definitive Guide” by Lars George (O’Reilly Media)

October 5, 2011 Leave a comment

Summary

(Disclosure: O’Reilly Media provided me with a free ebook copy of this book for the purposes of this review. I have done my best not to let that influence my opinions here.)

When a book bills itself as “The Definitive Guide,” well, that’s a tall order to fill. But, except for updates as new releases of HBase roll out, I can’t imagine another book surpassing this one by Lars George.

Lars George has been working with HBase since 2007 and is a full committer to the project as of 2009.  He now works for Cloudera (a company providing a commercial flavor of Hadoop, as well as Hadoop support).  After reading this book, there’s no question in my mind that George has deep understanding, not only of HBase as a data solution, but of the internal workings of HBase.

My Reactions

George gives the background and history of HBase in the larger context of relational databases and NoSQL, which I found to be very helpful. The many diagrams throughout the book are extremely useful in explaining concepts, especially for those of us coming from a relational database background.

George has an excellent and clear writing style. Take, for example, the section where he discusses The Problem with Relational Database Systems, giving a quick rundown of the typical steps for getting an RDBMS to scale up.  The flow of his summary reads like the increasing levels of panic that many of us have gone through when dealing with a database-backed application that will not scale.

As an example of how thorough and comprehensive the book is, look at chapter 2, where there is an extensive discussion of the type and class (not desktop PCs!) of machines suitable for running HBase. George gives a truly helpful set of configuration practices, even down to a recommendation for having redundant power supply units.

Another example of his thoroughness comes where George discusses delete methods (Chapter 3). He shows how you can use custom versioning, while admitting that the example is somewhat contrived. Indeed, right after elaborating the example, there is a distinct “Warning” box that admits that custom versioning is not actually recommended.  So, even though you may not implement custom versioning, you do understand it as a feature that HBase provides.

Many of the programming examples come with excellent remarks or discussions of the tradeoffs implicit in the techniques, including performance and scaling concerns.  Java developers will be most comfortable with the majority of examples, but they can be followed by anyone with some object-oriented programming experience.

I really appreciated the thorough discussion in chapter 8 (“Architecture”) of subjects like B+ trees vs. Log-Structured Merge Trees (LSMs), the Write-Ahead Log, and seeks vs. transfers, topics which are relevant not only to HBase but to many database systems of varying architectures.

The level of thoroughness is also the book’s only weakness.  I’m not sure who the target audience for this book is, because it serves both developers and system or database administrators.  While nearly every imaginable HBase topic is touched upon, some would have been better off merely listed, with appropriate references given to sources of more information (for example, all those hardware recommendations). The print edition of the book is 552 pages.

Still, a complaint that a book is too detailed shouldn’t be interpreted as much of a complaint.  Anyone with an interest in NoSQL databases in general, and HBase in particular should read and study this book.  It’s not likely to be superseded in the future.

The catalog page for “HBase: The Definitive Guide”.

Categories: Book Reviews, Books, NoSQL Tags: ,

Book Review: “Big Data Glossary” by Pete Warden (O’Reilly Media)

October 2, 2011 Leave a comment

Big Data Glossary” could probably have been titled  something like “Big Data Cheat Sheets” because it’s both more and less than a glossary.  Instead the book is an excellent summary of tools in the “big data” space, rather than a list of terms with definitions.

Warden tackles eleven topics:

  1. Some background on fundamental techniques (e.g., key-value stores)
  2. NoSQL databases
  3. MapReduce
  4. Storage techniques
  5. “Cloud” servers
  6. Data processing technologies (e.g., R and Lucene)
  7. Natural Language Processing
  8. Machine Learning
  9. Visualization
  10. Acquisition
  11. Serialization

He covers none of these topics in great detail, which will no doubt cause carping among some folks.  However, I really like his approach of sketching broad themes, identifying key projects (or products) in each space, and pointing the reader to further research.  Because the field of “big data” is so large, this short book (it’s only 50 pages) serves the extremely useful purpose of tying together the field by providing an overview.

Highly recommended for folks looking to get their feet wet in the great lake of big data.