Using our research, best practices and expertise, we help you understand how to optimize your business processes using applications, information and technology. We provide advisory, education, and assessment services to rapidly identify and prioritize areas for improvement and perform vendor selection
We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.
Services for Technology Vendors
We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.
Earlier this week I attended Hadoop World in New York City. Hosted by Cloudera, the one-day event was by almost all accounts a smashing success. Attendance was approximately double that of last year. There were five tracks filled mostly with user presentations. According to Mike Olson, CEO of Cloudera, the conference’s tweet stream (#hw2010) was one of the top 10 trending topics of that morning. Cloudera did an admirable job of organizing the event for the Hadoop community rather than co-opting it for its own purposes. Certainly, this was not done out of altruism, but it was done well and in a way that respected the time and interests of those attending.
If you are not familiar with Hadoop, it is an open source software framework used for processing “big data” in parallel across a cluster of industry-standard servers. Hadoop is largely synonymous with MapReduce, but the Hadoop framework has a variety of components including a distributed file system, a scripting language, a limited set of SQL operations and other data management tools.
By the way, the name Hadoop comes from a stuffed toy – a yellow elephant – belonging to Doug Cutting’s son, which made an appearance at the event. Doug created Hadoop and is now part of Cloudera’s management team.
How big is “big data”? In his opening remarks, Mike shared some statistics from a survey of attendees. The average Hadoop cluster among respondents was 66 nodes and 114 terabytes of data. However there is quite a range. The largest in the survey responses was a cluster of 1,300 nodes and more than 2 petabytes of data. (Presenters from eBay blew this away, describing their production cluster of 8,500 nodes and 16 petabytes of storage.) Over 60 percent of respondents had 10 terabytes or less, and half were running 10 nodes or less.
The one criticism of the event I heard repeatedly was that the sessions were too short for the presenters to get into the meat of their applications. John Kreisa, VP of Marketing at Cloudera, told me he agreed and indicated that the sessions likely will be longer next year.
What is it that makes Hadoop an elephant in the room? Over the past 12 to 18 months Hadoop has gone mainstream. A year ago, you could still say it was a fringe technology, but this week’s event and the development of a strong ecosystem around Hadoop make it clear that it is a force to be reckoned with. Many of the analytic database vendors have announced some type of support for Hadoop. Aster Data, Greenplum, Netezza and Vertica were sponsors of the event. Data integration and business intelligence vendors also have announced support for Hadoop, including event sponsors Pentaho and Talend. An ecosystem of development, administration and management tools is emerging as well, as shown by announcements from Cloudera and Karmasphere.
My colleague wrote about Cloudera Version 3 when it was announced back in June. You can expect to see expect to see new Cloudera Distributions for Hadoop (CDH) annually. Cloudera Enterprise – the bundling of CDH, plus Cloudera’s Management Tools – will be released semi-annually. Version 3.0 is in beta now. Version 3.5 is planned for the first quarter of 2011 and includes real time activity monitoring and an expanded file browser among other things.
If you work with big data but don’t know about Hadoop, you should spend some time learning about it. Our research is already finding the need for simpler and more cost effective methods to manage and use big data for analytics, business intelligence and information applications. If you want to understand some of the ways in which Hadoop is being used, I have another blog coming that will discuss its value for your business.
Let me know your thoughts or come and collaborate with me on Facebook, LinkedIn and Twitter .
Regards,
David Menninger – VP & Research Director
David Menninger leads technology software research and advisory for Ventana Research, now part of ISG. Building on over three decades of enterprise software leadership experience, he guides the team responsible for a wide range of technology-focused data and analytics topics, including AI for IT and AI-infused software.
Ventana Research’s Analyst Perspectives are fact-based analysis and guidance on business,
Each is prepared and reviewed in accordance with Ventana Research’s strict standards for accuracy and objectivity and reviewed to ensure it delivers reliable and actionable insights. It is reviewed and edited by research management and is approved by the Chief Research Officer; no individual or organization outside of Ventana Research reviews any Analyst Perspective before it is published. If you have any issue with an Analyst Perspective, please email them to ChiefResearchOfficer@ventanaresearch.com