PG Diploma In Big Data & Analytics Machine Learning.
home-13

PG Diploma in Big Data Analytics & Machine Learning

Program Benefits
  • Explain how a data warehouse combined with good business intelligence can increase a company’s bottom line
  • Describe the components of a data warehouse
  • Describe different forms of business intelligence that can be gleaned from a data warehouse and how that intelligence can be applied toward business decision-making
  • Develop dimensional models from which key data for critical decision-making can be extracted
  • Sketch out the process for extracting data from disparate databases and data sources, and then transforming the data for effective integration into a data warehouse
  • Load extracted and transformed data into the data warehouse
Data Warehousing & Business Intelligence
  • Introduction
    • Understanding the evolution,
    • Benefits of business intelligence
  • Data Integration
    • Business intelligence lifecycle.
    • Different Sources of Data.
  • Data Management
    • Need of Data Management.
    • BI for Reporting and Querying
    • Knowledge management and master data management (MDM)
  • OLAP (Online analytical processing)
    • Features and functions of OLAP
    • MOLAP
    • Data Drilling.
    • ROLAP
  • Data Warehousing
    • Data design and dimensional modeling.
    • Metadata
    • ETL (Extract, Transform & Load).
    • Dimension, Facts.
    • Types of Schema – Snowflake & Star Schema.
  • Design
    • Design and architecture.
    • Hardware and Software selection for BI.
    • DW/BI Metrics
Informatica
  • Course Introduction
    • Course Objectives
    • Introduction to Informatica PowerCenter
  • PowerCenter 9.x
    • Components & Architecture
    • Informatica PowerCenter Client Tools
    • Designer, Workflow Manager, Monitor
  • Designer Components
    • Types of Source & Targets
    • Working with Source & Targets.
    • Mapping, Mapplet, Transformation.
    • Object Navigator
    • Querying Tools
  • Basic Transformations
    • Types of Transformations.
    • Port Configurations.
    • Source Qualifier, Expression, Sorter.
    • Aggregator, Filter, Router Transformation.
    • Joiner, Ranker, Sequence generator.
    • SP, Union
  • Advanced Transformations
    • Lookups, Types of Lookups.
    • Lookup Cache- Types
    • Normalizer, Update Strategy.
    • SQL, TCT, Java Transformation.
    • Web Consumer.
    • XML
  • Add-ons Concepts
    • Variables & Parameter
    • Parameter Files
    • Versioning, Concurrent Workflows
    • Tasks
    • Debugger, Using Wizards
    • Type 1 & Type 2 Design.
  • Administration Tasks
    • Scheduling
    • Monitoring, Code Migration.
    • Deployment Groups
    • Query, Labels
  • Advance Topics
    • Partitioning
    • Pushdown Optimization.
    • DTM
    • Bottlenecks
Introduction to Bigdata
  • Types of data and their significance
  • Need for Bigdata Analytics.
  • Why Bigdata with Hadoop?
  • History of Hadoop.
  • Node, Rack, Cluster.
  • Architecture of Hadoop.
  • Characteristics of Namenode.
  • Significance of JobTracker and Tasktrackers.
  • Hase co-ordination with JobTracker.
  • Secondary Namenode usage and workaround.
  • Hadoop releases and their significance.
  • Workaround with datanodes.
  • YARN architecture.
  • Significance of scalability of operation.
  • Use cases where not to use Hadoop.
  • Use cases where Hadoop Is used.
  • Facebook, Twitter, Snapdeal, Flipkart.
Hadoop Java API
  • Hadoop Classes, What is MapReduceBase?
  • Mapper Class and its Methods
  • What is Partitioner and types
  • Hadoop specific data types
  • Working on unstructured data analytics
  • What is an iterator and its usage techniques
  • Types of mappers and reducers
  • What is output collector and its significance
  • Workaround with Joining of datasets
  • Complications with MapReduce
  • MapReduce anatomy
  • Anagram example, Teragen Example, Treasury Example
  • Word Count Example
  • Working with multiple mappers
  • Working with weather data on multiple datanodes in a Fully distributed architecture
  • Use Cases where MapReduce anatomy fails
  • Interview questions based on JAVA MapReduce.
Working with Pig Latin - I
  • Introduction to Pig Latin
  • History and evolution of Pig Latin
  • Why Pig is used only with Bigdata
  • Pig architecture and overview of Compiler and Execution Engine.
  • Pig Release and significance with bugfixes.
  • Pig Specific Data types
  • Complex Data types
  • Bags, Tuples, Fields
  • Pig Specific Methods.
  • Comparison between Yahoo Pig & Facebook Hive.
  • Working with Grunt Shell.
  • Grunt commands(total 17)
  • Pig Data input techniques for flatfiles (comma separated, tab delimited and fixed width). Working with schemaless approach
  • How to attach schema to a file/table in pig.
  • Schema referencing for similar tables and files.
  • Working with delimiters
Working with Pig Latin - II
  • Working with Binary Storage and Text Loader.
  • Bigdata Operations and Read write analogy.
  • Filtering Datasets
  • Filtering rows with specific condition
  • Filtering rows with multiple conditions
  • Filtering rows with string based conditions
  • Sorting Datasets
  • Sorting rows with specific column or columns
  • Multilevel Sort
  • Analogy of a sort operation
  • Grouping datasets and Co-grouping data
  • Joining Datasets
  • Types of Joins supported by Pig Latin
  • Aggregate operations like average, sum, min, max, count
  • Flatten operator
  • Creating a UDF(USER DFINED FUNCTION) using java
  • Calling UDF from pig script
  • Data validation scripts.
Working with Hive
  • Introduction
  • Installation and Configuration
  • Interacting HDFS using HIVE
  • Map Reduce Programs through HIVE
  • HIVE Commands
  • Loading, Filtering, Grouping
  • Data types, Operators
  • Joins, Groups
  • Sample programs in HIVE
  • Alter and Delete in Hive.
  • Partition in Hive.
  • Joins in Hive. Unions in hive.
  • Industry specific configuration of hive parameters.
  • Authentication & Authorization.
  • Statistics with Hive.
  • Archiving in Hive.
  • Hands-on exercise
Hbase & Zookeeper
  • Hbase Architectural point of view
  • Region servers and their implementation
  • Client API's and their features
  • How messaging system works
  • Columns and column families
  • Configuring hbase-site.xml
  • Available Client
  • Loading Hbase with semi-structured data
  • Internal data storage in Hbase
  • Timestamps
  • Hbase Architecture
  • Creating table with column families
  • MapReduce Integration.
  • Hbase: Advanced Usage, Schema Design
  • Load data from pig to Hbase
  • Sqoop architecture
  • Data Import and export in SQOOP.
  • Deploying quorum and configuration throughout the cluster.
Yarn Architecture
  • Introduction to YARN and MR2 daemons.
  • Active and Standby Namenodes
  • Resource Manager and Application Master
  • Node Manager
  • Container Objects and Container
  • Namenode Federation
  • Cloudera Manager and Impala
  • Load balancing in cluster with Namenode federation
  • Architectural differences between Hadoop 1.0 and 2.0
Flume
  • Introduction to Flume data integration
  • Flume installation on single node and multinode cluster
  • Flume architecture and various components
  • Data sources types and variants
  • Data target types and variants
  • Deploying an agent onto a single node cluster.
  • Problems associated with flume
  • Interview questions based on flume
DATA Science
  • Introduction to data science
    • What is data science?
    • Introduction to Analytics life cycle?
    • Different types of analysis
  • R Programming Basics
    • Why R
    • Introduction to R and CRAN
    • Nuts and Bolts of R language
    • Advances Features in R
  • Data Harmonization
    • ETL in Data Science world
    • Concepts of Tidy Data
    • Reading Tweets
    • Reshaping
    • Working with dates
  • Data Exploring
    • Exploratory Data Analysis
    • Plotting system like Base & ggplot
  • Research Presentation
    • Literate Programming
    • R-markdown and R-pubs
    • Publish document on Github
  • Inferential Statistics
    • Probability and expected values
    • Various Frequency Distributions
    • Confidence Intervals
    • Hypothesis testing
  • Regression Analysis
    • Regression definition
    • Residual variance
    • Automatic feature selection
  • Machine learning techniques
    • Supervised and Un-supervised learning methods
    • Classification and clustering
    • Time series forecasting
    • Model Ensemble
  • Interactive Graphics
    • Shiny
    • Solidify
    • GoogleVis
  • Natural Language Processing (NLP)
    • Basic building block of Python
    • NLTK
    • STOP-words
    • Stemming
    • Chunking
    • N-grams
    • Performing a Classification
  • Machine Learning with Python
    • Python with Scikit-learn package
    • Implementing regression, Decision Trees and Clustering Python
  • Spark Basics
    • Apache Spark
    • RDDs
    • Spark Transformation & Action
  • Machine Learning with PySpark
    • Introduction to PySpark
    • ML-Lib
    • Appling Machine learning to Big Data
DATA Science Foundation
  • Introduction to Course
    • Overview of course
    • Types of Analysis – Description, Predictive, Prescriptive
  • R programming Basics
    • Introduction to R and CRAN
    • Introduction to interface, CLI, Data types
    • Vectors, Lists, Factors, Matrices, Data-Frames
    • File IO (Flat files, Excel), subsetting
    • Control Statements
    • Creating function
  • Data Harmonization
    • Raw and Tidy Data –Nature of Data
  • Data Exploration
    • Base: plot(), hist(),boxplot(),barplot(),par()
  • Inferential Statistics
    • Summary Measures: central Tendency, Dispersion, Chebyshev’s Theorem
    • Probability: Addition, Multiplicative, Independence, Definition of pmf, pdf
    • Expected Values
  • Regression Analysis
    • Pearson’s Correlation Coefficient, simple LR, and least squares
  • Machine Learning Techniques
    • Types of ML algorithms, Prediction
    • Types of Errors, Sensitivity, Specificity, Receiver Operation Characteristics caret package
    • Bayes Theorem, Naïve Bayes, KNN
    • Explanation of Classification trees, regression trees, packages: part, party
    • Clustering – K-Means, Hierarchical, Dendrograms

Big Data Course Pune


Verification