Hadoop Training


Introduction to BigData and Hadoop
• Limitation of existing solution for Big data problem
• How hadoop Solves Big data problem
• Hadoop Eco-System Component
• Hadoop Architecture

Hadoop Distributed File System
• Concept of Hadoop Distributed file system(HDFS)
• Design of HDFS
• Common challenges
• Best practices for scaling with your data
• Configuring HDFS
• Interacting with HDFS
• HDFS permission and Security
• Additional HDFS Tasks
• Data Flow (Anatomy of a File Read, Anatomy of a File Write, Coherency Model)
• Hadoop Archives

MapReduce 1.x,Advance Map Reduce, YARN
• What is Map Reduce?
• Data Types used in Hadoop
• Concept of Mappers
• Concept of Reducers
• The Execution Framework architecture
• Concept of Partioners
• Concept of Combiners
• Hadoop Cluster Architecture
• MapReduce types
• Input Formats (Input Splits and Records, Text Input, Binary Input, Multiple Inputs)
• OutPut Formats (TextOutput, BinaryOutPut, Multiple Output).
• Writing Programs for MapReduce

Hadoop Installation
• Installation of Hadoop

Getting Started
• Running a sample program

HDFS & Pseudo Cluster Environment
• Storage HDFS
• Name Node HA & Node Manager
• Cluster specification
• Hadoop Configuration (Environment Settings, Hadoop Daemon- Properties, Addresses and Ports)
• Basic Linux and HDFS Commands
• Setup a Hadoop Cluster

• What is PIG?
• Installing and Running Pig
• Grunt
• Pig's Data Model
• Pig Latin
• Developing & Testing Pig Latin Scripts
• Writing Evaluation
• Filter
• Loads & Store Functions

• What is HIVE?
• Hive Architecture
• Running Hive
• Comparison with Traditional Database (Schema on Read versus Write, Updates, Transactions and Indexes)
• HiveQL (Data Types, Operators and Functions)
• Tables (Managed and External Tables, Partitions and Buckets, Storage Formats, Importing Data)
• Altering Tables, Dropping Tables
• Querying Data (Sorting And Aggregating, Map Reduce Scripts, Joins & Subqueries & Views
• Map and Reduce site Join to optimize Query
• User Defined Functions
• Appending Data into existing Hive Table
• Custom Map/Reduce in Hive
• Perform Data Analytics using Pig and Hive

• What is HBASE?
• Client API- Basics
• Client API- Advanced Features
• Client API - Administrative Features
• Available Client
• Architecture
• MapReduce Integration
• Advanced Usage
• Advanced Indexing
• Impelment HBASE

• Database Imports
• Workign with Imported data
• Importing Large Objects
• Performing Exports
• Exports- A Deeper look

• What is Zookeeper?
• The Zookeeper Service (Data Modal, Operations, Implementation,Consistency, Sessions, States)
• Building Applications with Zookeeper (Zookeeper in Production)

• What is Oozie?
• OOZIE Installation
• Running an OOZIE EXAMPLE
• Expression Language Funtions
• Control Flow nodes
• Action Node Properties(Map Reduce,Hive,Pig,java)

• What is Ambari?
• Why Ambari is needed?

13. Work Labs
• Hands on with examples

