Sophisticated data instrumentation and collection technologies are leading to unprecedented growth. Data-driven organizations need to be able to scalably store data and perform complex data processing on the collected data(i.e. not just "queries"). Given the unstructured nature of the source data, and the need to stay agile, organizations also need to be able to change their schemas dynamically (at read-time vs write-time). Apache Hadoop is an open-source distributed fault-tolerant system that leverages commodity hardware to achieve large-scale agile data storage and processing. In this presentation, Dr. Amr Awadallah will introduce the design principles behind Apache Hadoop and explain the architecture of its core sub-systems (the Hadoop Distributed File System and MapReduce). Amr will also contrast Hadoop to relational database systems and illustrate how they truly complement each other. Finally, Amr will cover the Hadoop ecosystem at large which includes a number of projects that together form a cohesive Data Operating System for the modern data center.
Slides:
Download the slides for this presentation in PDF format.
About the speaker:
Dr. Amr Awadallah is Co-Founder and CTO of Cloudera, Inc. where he is responsible for all engineering efforts from product development to release, for both the open source projects and Cloudera's proprietary management software. Prior to Cloudera Amr served as Vice President of Engineering at Yahoo!, and led a team that used Apache Hadoop extensively for data analysis and business intelligence across many of Yahoo!'s online services. Amr joined Yahoo in June-2000 after they acquired VivaSmart (A startup which he co-founded at Stanford in 1999). Amr holds Bachelor's and Master's degrees in Electrical Engineering from Cairo University, Egypt, and a Doctorate in Electrical Engineering from Stanford University.
Contact information:
Amr Awadallah
210 Portage Ave
Palo Alto, CA 94306