Describes BIG DATA & HADOOP Short Term Course Content At BICARD
home-13

Introduction to Bigdata
  • Types of data and their significance
  • Need for Big Data Analytics.
  • Why Big data with Hadoop?
  • History of Hadoop.
  • Node, Rack, Cluster.
  • Architecture of Hadoop.
  • Characteristics of Namenode.
  • Significance of JobTracker and Tasktrackers.
  • Hase co-ordination with JobTracker.
  • Secondary Namenode usage and workaround.
  • Hadoop releases and their significance.
  • Workaround with datanodes.
  • YARN architecture.
  • Significance of scalability of operation.
  • Use cases where not to use Hadoop.
  • Use cases where Hadoop Is used.
  • Facebook, Twitter, Snapdeal, Flipkart.
Working with Pig Latin
  • Introduction to Pig Latin
  • History and evolution of Pig Latin
  • Why Pig is used only with Bigdata
  • Pig architecture and overview of Compiler and Execution Engine.
  • Pig Release and significance with bugfixes.
  • Pig Specific Data types
  • Complex Data types
  • Bags, Tuples, Fields
  • Pig Specific Methods.
  • Comparison between Yahoo Pig & Facebook Hive.
  • Working with Grunt Shell.
  • Grunt commands(total 17)
  • Pig Data input techniques for flatfiles (comma separated, tab delimited and fixed width). Working with schemaless approach
  • How to attach schema to a file/table in pig.
  • Schema referencing for similar tables and files.
  • Working with delimiters
Working with Hive
  • Introduction
  • Installation and Configuration
  • Interacting HDFS using HIVE
  • Map Reduce Programs through HIVE
  • HIVE Commands
  • Loading, Filtering, Grouping
  • Data types, Operators
  • Joins, Groups
  • Sample programs in HIVE
  • Alter and Delete in Hive.
  • Partition in Hive.
  • Joins in Hive. Unions in hive.
  • Industry specific configuration of hive parameters.
  • Authentication & Authorization.
  • Statistics with Hive.
  • Archiving in Hive.
  • Hands-on exercise
Hbase
  • Hbase Architectural point of view
  • Region servers and their implementation
  • Client API's and their features
  • How messaging system works
  • Columns and column families
  • Configuring hbase-site.xml
  • Available Client
  • Loading Hbase with semi-structured data
  • Internal data storage in Hbase
  • Timestamps
  • Hbase Architecture
  • Creating table with column families
  • MapReduce Integration.
  • Hbase: Advanced Usage, Schema Design
  • Load data from pig to Hbase
  • Sqoop architecture
  • Data Import and export in SQOOP.
  • Deploying quorum and configuration throughout the cluster.
Yarn Architecture
  • Introduction to YARN and MR2 daemons.
  • Active and Standby Namenodes
  • Resource Manager and Application Master
  • Node Manager
  • Container Objects and Container
  • Namenode Federation
  • Cloudera Manager and Impala
  • Load balancing in cluster with Namenode federation
  • Architectural differences between Hadoop 1.0 and 2.0
Flume
  • Introduction to Flume data integration
  • Flume installation on single node and multinode cluster
  • Flume architecture and various components
  • Data sources types and variants
  • Data target types and variants
  • Deploying an agent onto a single node cluster.
  • Problems associated with flume
  • Interview questions based on flume

Big Data Course Pune


Verification