Apache Hadoop is the open-source framework designed to help solve some of the storage and analysis issues around Big Data. This hands-on workshop continues on from COMP1630, and assumes prior knowledge of the industry standards in data modeling, relational database design, and SQL programming. It is aimed at a broad audience including administrators, data analysts, and managers. Participants build on their existing database skills to work with larger and more complex data sets and to gain an overview of Hadoop and Big Data. Starting with the basic concepts and components of Hadoop, students will use Hive to query data stored in Hadoop with an SQL-like query language. Lectures and labs introduce the normal usage of a Hadoop system using the Cloudera Quickstart virtual machine. Homework and exercises will focus on getting data into the Hadoop Distributed File System (HDFS), basic file operations, and running queries on existing data. Upon successful completion of this course, participants will be able to define Big Data, identify the basic components of Hadoop, and run queries on Big Data using SQL on Hive.
If you have a comment or question about this course, please complete and submit the form below.
Interested in being notified about future offerings of COMP 3840 - Introduction to Big Data and Hadoop?
If so, fill out the information below and we'll notify you by email when courses for each new term are displayed here.