Big Data & Hadoop
home-13

Big Data & Hadoop

Introduction to Bigdata
  • Types of data and their significance
  • Need for Bigdata Analytics.
  • Why Bigdata with Hadoop?
  • History of Hadoop.
  • Node, Rack, Cluster.
  • Architecture of Hadoop.
  • Characteristics of Namenode.
  • Significance of JobTracker and Tasktrackers.
  • Hase co-ordination with JobTracker.
  • Secondary Namenode usage and workaround.
  • Hadoop releases and their significance.
  • Workaround with datanodes.
  • YARN architecture.
  • Significance of scalability of operation.
  • Use cases where not to use Hadoop.
  • Use cases where Hadoop Is used.
  • Facebook, Twitter, Snapdeal, Flipkart.
Hadoop Java API
  • Hadoop Classes, What is MapReduceBase?
  • Mapper Class and its Methods
  • What is Partitioner and types
  • Hadoop specific data types
  • Working on unstructured data analytics
  • What is an iterator and its usage techniques
  • Types of mappers and reducers
  • What is output collector and its significance
  • Workaround with Joining of datasets
  • Complications with MapReduce
  • MapReduce anatomy
  • Anagram example, Teragen Example, Treasury Example
  • Word Count Example
  • Working with multiple mappers
  • Working with weather data on multiple datanodes in a Fully distributed architecture
  • Use Cases where MapReduce anatomy fails
  • Interview questions based on JAVA MapReduce.
Working with Pig Latin - I
  • Introduction to Pig Latin
  • History and evolution of Pig Latin
  • Why Pig is used only with Bigdata
  • Pig architecture and overview of Compiler and Execution Engine.
  • Pig Release and significance with bugfixes.
  • Pig Specific Data types
  • Complex Data types
  • Bags, Tuples, Fields
  • Pig Specific Methods.
  • Comparison between Yahoo Pig & Facebook Hive.
  • Working with Grunt Shell.
  • Grunt commands(total 17)
  • Pig Data input techniques for flatfiles (comma separated, tab delimited and fixed width). Working with schemaless approach
  • How to attach schema to a file/table in pig.
  • Schema referencing for similar tables and files.
  • Working with delimiters
Working with Pig Latin - II
  • Working with Binary Storage and Text Loader.
  • Bigdata Operations and Read write analogy.
  • Filtering Datasets
  • Filtering rows with specific condition
  • Filtering rows with multiple conditions
  • Filtering rows with string based conditions
  • Sorting Datasets
  • Sorting rows with specific column or columns
  • Multilevel Sort
  • Analogy of a sort operation
  • Grouping datasets and Co-grouping data
  • Joining Datasets
  • Types of Joins supported by Pig Latin
  • Aggregate operations like average, sum, min, max, count
  • Flatten operator
  • Creating a UDF(USER DFINED FUNCTION) using java
  • Calling UDF from pig script
  • Data validation scripts.
Working with Hive
  • Introduction
  • Installation and Configuration
  • Interacting HDFS using HIVE
  • Map Reduce Programs through HIVE
  • HIVE Commands
  • Loading, Filtering, Grouping
  • Data types, Operators
  • Joins, Groups
  • Sample programs in HIVE
  • Alter and Delete in Hive.
  • Partition in Hive.
  • Joins in Hive. Unions in hive.
  • Industry specific configuration of hive parameters.
  • Authentication & Authorization.
  • Statistics with Hive.
  • Archiving in Hive.
  • Hands-on exercise
Hbase & Zookeeper
  • Hbase Architectural point of view
  • Region servers and their implementation
  • Client API's and their features
  • How messaging system works
  • Columns and column families
  • Configuring hbase-site.xml
  • Available Client
  • Loading Hbase with semi-structured data
  • Internal data storage in Hbase
  • Timestamps
  • Hbase Architecture
  • Creating table with column families
  • MapReduce Integration.
  • Hbase: Advanced Usage, Schema Design
  • Load data from pig to Hbase
  • Sqoop architecture
  • Data Import and export in SQOOP.
  • Deploying quorum and configuration throughout the cluster.
Yarn Architecture
  • Introduction to YARN and MR2 daemons.
  • Active and Standby Namenodes
  • Resource Manager and Application Master
  • Node Manager
  • Container Objects and Container
  • Namenode Federation
  • Cloudera Manager and Impala
  • Load balancing in cluster with Namenode federation
  • Architectural differences between Hadoop 1.0 and 2.0
Flume
  • Introduction to Flume data integration
  • Flume installation on single node and multinode cluster
  • Flume architecture and various components
  • Data sources types and variants
  • Data target types and variants
  • Deploying an agent onto a single node cluster.
  • Problems associated with flume
  • Interview questions based on flume

Big Data Course Pune


Verification