I recently wrote a post for the Engineering blog at my employer. “Using Talend Big Data to Move Data from MongoDB to PostgreSQL” describes ETL from a NoSQL data store, in this case, MongoDB, to a traditional relational data store, PostgreSQL.
Happy New Year!
With a new year here, I’ve been thinking about ways to expand my skill set and the technologies to learn more about in 2012. As I’m still a data guy at heart, it should be no surprise that the technologies that interest me are related to data and databases.
I hoping to do a lot of work with MongoDB this year. In the past, I’ve played around with it some, but it looks as if there will be at least one project at work that will let me get some deep experience using MongoDB. I’m currently reading “MongoDB: The Definitive Guide” by Kristina Chodorow and Michael Dirolf and am really enjoying it. Based on what I’ve read so far, I would say this is one of those classic O’Reilly books: specialized, but so well-written that you almost forget that you’re reading something highly technical.
I attended a number of talks at QCon SF back in November and one topic that recurred frequently was how companies are using Hadoop. It seemed as if every presenter described how their company is finding a way to use at least part of the Hadoop ecosystem. And Hadoop is truly that: an entire ecosystem, encompassing not only the core project, but also Pig, ZooKeeper, Mahout, HBase, and still others. You can find more information at the Apache Hadoop project page. I’m hoping to get a proof-of-concept cluster up and running by the middle of the year.
Full-text search technology falls into an area that, like MongoDB, I have had some exposure to in the past, but would like very much to learn more about. I’m planning to spend some time this year on Lucene first, and then move onto the search server, Solr. I’m fortunate to have some colleagues with a lot of experience in this area, and I intend to mine their knowledge whenever possible.
I believe that those of us steeped in the relational database world can take a great step toward a more data architect mindset by having a deep understanding of XML and its related technologies. My plan this year to is to get a firmer understanding of XML, XPath, XSLT, XML Schema, etc. I’ve done quite a bit of work with XML in the past, but this has typically been in relation to Oracle’s handling of XML documents within the database. I want to gain a broader understanding.
Of course JSON is coming on strong in supplanting XML in some of its previous strongholds. For example, MongoDB’s data model relies on JSON-formatted documents.
This seems like enough to keep me busy outside of my day job for one year!