It’s hard to believe, but the first Hadoop cluster went into production at Yahoo 10 years ago today. What began as an experiment in distributed computing for an Internet search engine has turned into a global phenomenon and a focal point for a big data ecosystem driving billions in spending. Here are some thoughts on the big yellow elephant’s milestone from the people involved in Hadoop’s early days.
Hadoop’s story started before January 2006, of course. In the early 2000s, Doug Cutting, who created the Apache Lucene search engine, was working with Mike Cafarella to build a more scalable search engine called Nutch. They found inspiration in the Google File System white paper, which Cutting and Cafarella used as a model. Cutting and Cafarella built the Nutch Distributed File System in 2004, and then built a MapReduce framework to sit atop it a year later.
The software was promising, says Cutting, who is now the chief architect at Cloudera, but they needed some outside support. “I was worried that, if the two of us working on it then, Mike Cafarella & I, didn’t get substantial help, then the entire effort might fizzle and be forgotten,” Cutting tells Datanami via email. “We found help in Yahoo, who I started working for in early 2006. Yahoo dedicated a large team to Hadoop and, after a year or so of investment, we at last had a system that was broadly usable.”
Continue reading Alex Woodie’s article published on datanami.com
Apache, Cloudera, Doug Cutting, hadoop, hive, MapReduce, Pig