We are offering Online & Offline and Corporate Training Courses. For more inform please contact us – 9619341713
Empower Your Data Journey With Informatica Courses 100% Job Placement. | Learn Java Programming Language 100% Job Placement | Devops Certification Training 100% Job Placement

Big Data and Hadoop Development

Big Data and Hadoop Development

Duration – 3 Months
Overview : -
Big data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques and frameworks.
Requirements
  • Programming
  • Quantitative Skills
  • Multiple Technologies
  • Understanding of Business & Outcomes
  • Interpretation of Data
Big Data and Hadoop
  • Limitation of existing solution for Big data problem
  • How hadoop Solves Big data problem
  • Hadoop Eco-System Component
  • Hadoop Architecture
Hadoop Distributed File System
  • Concept of Hadoop Distributed file system(HDFS)
  • Design of HDFS
  • Common challenges
  • Best practices for scaling with your data
  • Configuring HDFS
  • Interacting with HDFS
  • HDFS permission and Security
  • Additional HDFS Tasks
  • Data Flow (Anatomy of a File Read, Anatomy of a File Write, Coherency Model)
Advance Map Reduce, YARN
  • What is Map Reduce?
  • Data Types used in Hadoop
  • Concept of Mappers
  • Concept of Reducers
  • The Execution Framework architecture
  • Concept of Partioners
  • Concept of Combiners
  • Hadoop Cluster Architecture
  • MapReduce types
  • Input Formats (Input Splits and Records, Text Input, Binary Input, Multiple Inputs)
  • OutPut Formats (TextOutput, BinaryOutPut, Multiple Output)
  • Writing Programs for MapReduce
Hadoop Installation
  • Installation of Hadoop
Getting Started
  • Running a sample program
HDFS & Pseudo Cluster Environment
  • Storage HDFS
  • Name Node HA & Node Manager
  • Cluster specification
  • Hadoop Configuration (Environment Settings, Hadoop Daemon- Properties, Addresses and Ports)
  • Basic Linux and HDFS Commands
  • Setup a Hadoop Cluster
What is PIG?
  • Installing and Running Pig
  • Grunt
  • Pig’s Data Model
  • Pig Latin
  • Developing & Testing Pig Latin Scripts
  • Writing Evaluation
  • Filter
  • Loads & Store Functions
What is HIVE?
  • What is HIVE ?
  • Hive Architecture
  • Running Hive
  • Pig’s Data Model
  • Comparison with Traditional Database (Schema on Read versus Write, Updates, Transactions and Indexes)
  • HiveQL (Data Types, Operators and Functions)
  • Tables (Managed and External Tables, Partitions and Buckets, Storage Formats, Importing Data)
  • Altering Tables, Dropping Tables
  • Querying Data (Sorting And Aggregating, Map Reduce Scripts, Joins & Subqueries & Views
  • Map and Reduce site Join to optimize Query
  • User Defined Functions
  • Appending Data into existing Hive Table
  • Custom Map/Reduce in Hive
  • Perform Data Analytics using Pig and Hive
What is HBASE?
  • What is HBASE?
  • Client API- Basics
  • Client API- Advanced Features
  • Client API – Administrative Features
  • Available Client
  • Architecture
  • MapReduce Integration
  • Advanced Usage
  • Advanced Indexing
  • Impelment HBASE
What is SQOOP?
  • What is SQOOP?
  • Database Imports
  • Importing Large Objects
  • Performing Exports
  • Exports- A Deeper look
What is ZooKeeper?
  • What is ZooKeeper?
  • The Zookeeper Service (Data Modal, Operations, Implementation,Consistency, Sessions, States)
  • Building Applications with Zookeeper (Zookeeper in Production)
What is Oozie?
  • What is Oozie?
  • OOZIE Installation
  • Running an OOZIE EXAMPLE
  • OOZIE WEBCONSOLE
  • Expression Language Funtions
  • OOZIE WORKFLOW EXAMPLE(Java Code,PIG,Hive)
  • Control Flow nodes
  • Action Node Properties(Map Reduce,Hive,Pig,java)
What is Ambari?
  • What is Ambari?
  • Why Ambari is needed?
What is Work Labs?
  • Hands on with examples
Introduction
  • Big Data
  • 3Vs
  • Role of Hadoop in Big data
  • Hadoop and its ecosystem
  • Overview of other Big Data Systems
  • Requirements in Hadoop
  • Use Cases of Hadoop
Installing the (HDFS)
  • Defining key design assumptions and architecture
  • Configuring and setting up the file system
  • Issuing commands from the console
  • Reading and writing files
Setting the stage for MapReduce
  • Introducing the computing daemons
  • Dissecting a MapReduce job
Defining Hadoop Cluster Requirements
  • Selecting appropriate hardware
  • Designing a scalable cluster
What is SQOOP?
  • Sqoop Installations and Basics
  • Importing Data from Oracle to HDFS
  • Advance Imports
  • Real Time UseCase
  • Exporting Data from HDFS to Oracle
  • Running Sqoop in Cloudera
What is Building the cluster?
  • Installing Hadoop daemons
  • Optimizing the network architecture
Configuring a Cluster–Pseudo node and multi-node
  • Setting basic configuration parameters
  • Configuring block allocation, redundancy and replication
Deploying MapReduce
  • Installing and setting up the MapReduce environment
  • Delivering redundant load balancing via Rack Awareness
Managing Resources and Cluster Health Maintaining HDFS
  • Starting and stopping Hadoop daemons
  • Monitoring HDFS status
  • Adding and removing data nodes
Administering MapReduce
  • Managing MapReduce jobs
  • Tracking progress with monitoring tools
  • Commissioning and decommissioning compute nodes
Performing Hadoop status checks
  • Importing and exporting relational information with Sqoop
Planning for Backup, Recovery and Security
  • Coping with inevitable hardware failures
  • Securing your Hadoop cluster
Introduction to Spark
  • Limitations of MapReduce in Hadoop Objectives
  • Batch vs. Real-time analytics
  • Application of stream processing
  • How to install Spark
  • Spark vs. Hadoop Eco-system
Programming in Scala
  • Features of Scala
  • Basic data types and literals used
  • List the operators and methods used in Scala
  • Concepts of Scala
Using RDD for Creating Applications in Spark
  • Features of RDDs
  • How to create RDDs
  • RDD operations and methods
  • How to run a Spark project with SBT
  • Explain RDD functions and describe how to write different codes in Scala
What is SparkSQL
  • Explain the importance and features of SparkSQL
  • Describe methods to convert RDDs to DataFrames
  • Explain concepts of SparkSQL
  • Describe the concept of hive integration
What is Spark Streaming
  • Concepts of Spark Streaming
  • Describe basic and advanced sources
  • Explain how stateful operations work
  • Explain window and join operations
What is Spark ML Programming
  • Explain the use cases and techniques of Machine Learning (ML)
  • Describe the key concepts of Spark ML
  • Explain the concept of an ML Dataset, and ML algorithm, model selection via cross validation
What is Spark GraphX Programming
  • Explain the key concepts of Spark GraphX programming
  • Limitations of the Graph Parallel system
  • Describe the operations with a graph
  • Graph system optimizations
What is Scala
  • Scala – Environment Setup • Scala – Basic Syntax
  • Scala – Data Types
  • Scala – Variables
  • Scala – Classes & Objects
  • Scala – Access Modifiers
  • Scala – Operators
  • Scala – IF ELSE
  • Scala – Loop Statements
  • Scala – Functions
  • Scala – Closures
  • Scala – Strings
  • Scala – Arrays
  • Scala – Collections
  • Scala – Traits
  • Scala – Pattern Matching
  • Scala – Regular Expressions
  • Scala – Exception Handling
  • Scala – Extractors
  • Scala – Files I/O

Fill the below form for Register

Contact Form Demo

You cannot copy content of this page