About the Data Scientist Master's program developed in collaboration with IBM

IBM is the second-largest Predictive Analytics and Machine Learning solutions provider globally (source: The Forrester Wave report, September 2018). A joint partnership with Simplilearn and IBM introduces students to integrated blended learning, making them experts in Artificial Intelligence and Data Science. The Data Science course in collaboration with IBM will make students industry-ready for Artificial Intelligence and Data Science job roles. IBM is a leading cognitive solutions and cloud platform company, headquartered in Armonk, New York, offering a plethora of technology and consulting services. Each year, IBM invests $6 billion in research and development and has achieved five Nobel prizes, nine US National Medals of Technology, five US National Medals of Science, six Turing Awards, and 10 Inductions in the US Inventors Hall of Fame. What can I expect from these Data Science courses developed in collaboration with IBM? Upon completion of this Data Scientist online Master's program, you will receive the certificates from IBM(for IBM courses) and Simplilearn for the courses in the learning path. These certificates will testify to your skills as an expert in Data Science. You will also receive the following: Access to IBM Cloud Lite account Industry-recognized Data Scientist Master's certificate

What are the learning objectives?

Data Scientist is one of the hottest professions. IBM predicts the demand for Data Scientists will rise by 28% by 2020. Data Scientist Master’s program co-developed with IBM encourages you to master skills including statistics, hypothesis testing, data mining, clustering, decision trees, linear and logistic regression, data wrangling, data visualization, regression models, Hadoop, Spark, PROC SQL, SAS Macros, recommendation engine, supervised, and unsupervised learning and more. This Data Scientist Master’s program covers extensive Data Science training, combining online instructor-led classes and self-paced learning co-developed with IBM. The program concludes with a capstone project designed to reinforce the learning by building a real industry product encompassing all the key aspects learned throughout the program. The skills focused on in this program will help prepare you for the role of a Data Scientist.

What Data Science projects are included in this Master's program?

This Data Scientist Master's program includes 15+ real-life, industry-based projects on different domains to help you master concepts of Data Science and Big Data. A few of the projects that you will be working on are mentioned below: Capstone Project: Description: You will go through dedicated mentor classes in order to create a high-quality industry project, solving a real-world problem leveraging the skills and technologies learned throughout the program. The capstone project will cover all the key aspects of data extraction, cleaning, and visualization to model building and tuning. You also get the option of choosing the domain/industry dataset you want to work on from the options available. After successful submission of the project, you will be awarded a capstone certificate that can be showcased to potential employers as a testament to your learning. Project 1: Products rating prediction for Amazon Domain: E-commerce Amazon, one of the leading US-based e-commerce companies, recommends products within the same category to customers based on their activity and reviews on other similar products. Amazon would like to improve this recommendation engine by predicting ratings for the non-rated products and add them to recommendations accordingly. Project 2: Improving customer experience for Comcast Domain: Telecom Description: Comcast, one of the leading US-based global telecommunication companies wants to improve customer experience by identifying and acting on problem areas that lower customer satisfaction if any. The company is also looking for key recommendations that can be implemented to deliver the best customer experience. Project 3: Attrition Analysis for IBM Domain: Workforce Analytics Description: IBM, one of the leading US-based IT companies, would like to identify the factors that influence the attrition of employees. Based on the parameters identified, the company would also like to build a logistics regression model that can help predict if an employee will churn or not. Project 4: Predict accurate sales for 45 stores of Walmart, one of the leading US-based leading retail stores, considering the impact of promotional markdown events. Check if macroeconomic factors like CPI, unemployment rate, etc. have an impact on sales. Domain: Retail Description: Walmart runs several promotional markdown events throughout the year. The markdowns precede prominent holidays, such as the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in valuation than non-holiday weeks. The business is facing a challenge due to unforeseen demand, resulting in stocks running out at times due to inaccurate demand estimation. The macroeconomic factors like CPI, Unemployment Index, etc. also play an important role in predicting the demand, but the business hasn’t been able to leverage these factors yet. As a part of this project, create a model to highlight the effects of markdowns on holiday weeks. Project 5: Learn how leading Healthcare industry leaders make use of Data Science to leverage their business. Domain: HealthCare Description: Predictive analytics can be used in healthcare to mediate hospital readmissions. In healthcare and other industries, predictors are most useful when they can be brought into action. However, historical and real-time data alone are worthless without intervention. More importantly, to judge the efficiency and value of forecasting a trend and ultimately changing behavior, both the predictor and the intervention must be integrated back into the same system and workflow where the trend originally occurred. Project 6: Understand how Insurance leaders like Berkshire Hathaway, AIG, AXA, etc. make use of Data Science by working on a real-life project based on Insurance. Domain: Insurance Description: The use of predictive analytics has increased greatly in insurance businesses, especially for the biggest companies, according to the 2013 Insurance Predictive Modeling Survey. While the survey showed an increase in predictive modeling throughout the industry, all the respondents from companies that write over $1 billion in personal insurance employ predictive modeling, compared to 69% of companies with less than that amount of premium. Project 7: See how banks like Citigroup, Bank of America, ICICI, HDFC, etc. make use of Data Science to stay ahead of the competition. Domain: Banking Description: A Portuguese banking institution ran a marketing campaign to convince potential customers to invest in a bank term deposit. Its marketing campaigns were conducted through phone calls, and sometimes the same customer was contacted more than once. Your job is to analyze the data collected from the marketing campaign. Project 8: Learn how Stock Markets, such as NASDAQ, NSE, and BSE leverage Data Science and Analytics to arrive at consumable data from complex datasets. Domain: Stock Market Description: You need to import data using Yahoo data reader of the following companies: Yahoo, Apple, Amazon, Microsoft, and Google. Perform fundamental analytics including plotting closing price, plotting stock trade by volume, performing daily return analysis, and using pair plot to show the correlation between all the stocks. Project 9: See how Data Science is used in the field of engineering by taking up this case study of MovieLens Dataset Analysis. Domain: Engineering Description: The GroupLens Research Project is a research group in the Department of Computer Science and Engineering at the University of Minnesota. The researchers of this group are involved in many research projects related to the fields of information filtering, collaborative filtering, and recommender systems. Project 10: Understand how leading retail companies like Walmart, Amazon, Target, etc. make use of Data Science to analyze and optimize their product placements and inventory. Domain: Retail Description: Analytics is used in optimizing product placements on shelves or optimization of inventory to be kept in the warehouses using industry examples. Through this project, participants learn the daily cycle of product optimization from the shelves to the warehouse. This gives them insights into regular occurrences in the retail sector.

Who should take this Data Scientist Master’s program?

The Data Science role requires an amalgam of experience, data science knowledge, and correct tools and technologies. It is a solid career choice for both new and experienced professionals. Aspiring professionals of any educational background with an analytical frame of mind are most suited to pursue the Data Science course, including: IT Professionals Analytics Managers Business Analysts Banking and Finance Professionals Marketing Managers Supply Chain Network Managers Beginners or Recent Graduates in Bachelors or Master’s Degree

What are the prerequisites for this Data Science training?

Professionals wishing to succeed in this Data Science course should have: Basic knowledge of statistics Basic understanding of any programming language

What type of jobs will I be suited for after completing this Data Science course?

Data Science course, you will have the skills that will help you land your dream job in Data Science. Jobs that are ideal for data science-trained professionals include: Data Analyst Data Scientist Analytics Manager/Lead Machine Learning Engineer Statistical Programming Specialist

What is Data Science course?

Data Science is a broad field and you need to learn about so many concepts if you are a beginner. A Data Science course is a training program of around six to twelve months, often taken by industry experts to help candidates build a strong foundation in the field. Apart from the theoretical material, an online data science course includes virtual labs, industry projects, interactive quizzes, and practice tests which can give you enhanced learning experience.

Will this course help me to learn Data Science from scratch?

Professionals who do not have any prior knowledge of the field can easily begin with this Data Scientist Master’s program as you’ll gain a thorough knowledge of the basic concepts as well.

Data Science Certification | Online Data Science Certification

Name: Data Scientist
Item: Data Scientist
Rating: 5

Date	Day	Timing
16-Mar-2021	Mon-Fri (Weekdays Regular)	08:00 AM & 10:00 AM Batches (Class 1Hr - 1:30Hrs) / Per Session
19-Mar-2021	Mon-Fri (Weekdays Regular)	08:00 PM & 10:00 PM Batches (Class 1Hr - 1:30Hrs) / Per Session
21-Mar-2021	Sat-Sun (Weekdays Regular)	(10:00 AM - 12:30 PM) (Class 3hr - 3:30Hrs) / Per Session
22-Mar-2021	Sat-Sun (Week Regular)	(06:00 PM - 08:30 PM) (Class 4:30Hr - 5:00Hrs) / Per Session

About The Program

DeepNeuron online master's in Data Science program lets you gain proficiency in Data Science. You will work on real-world projects in Data Science with R, Hadoop Dev, Test, and Analysis, Apache Spark, Scala, Deep Learning, Tableau, Data Science with SAS, SQL, MongoDB, and more. As part of this online & classroom training, you will receive Three additional self-paced courses co-created with IBM, namely, Excel, MongoDB, MS-SQL. M Enroll now and pursue your MS in Data Science online. IBM is the second-largest Predictive Analytics and Machine Learning solutions provider globally (source: The Forrester Wave report, September 2018). A joint partnership with Data2business insights and IBM introduces students to integrated blended learning, making them experts in Artificial Intelligence and Data Science. The Data Science course in collaboration with IBM will make students industry-ready for Artificial Intelligence and Data Science job roles. What can I expect from these Data Science courses developed in collaboration with IBM? Upon completion of this Data Scientist online Master's program, you will receive the certificates from IBM(for IBM courses) and DeepNeuron for the courses in the learning path. These certificates will testify to your skills as an expert in Data Science.
Access to IBM Cloud Lite account
Industry-recognized Data Scientist Master's certificate

Data Scientist is one of the hottest professions.IBM predicts the demand for Data Scientists will rise by 28% by 2020. Data Scientist Master’s program co-developed with IBM encourages you to master skills including statistics, hypothesis testing, data mining, clustering, decision trees, linear and logistic regression, data wrangling, data visualization, regression models, Hadoop, Spark, PROC SQL, SAS Macros, recommendation engine, supervised, and unsupervised learning and more.

Machine Learning project management methodology
Data Collection - Surveys and Design of Experiments
Data Types namely Continuous, Discrete, Categorical, Count, Qualitative, Quantitative and its identification and application
Further classification of data in terms of Nominal, Ordinal, Interval & Ratio types
Balanced versus Imbalanced datasets
Cross Sectional versus Time Series vs Panel / Longitudinal Data
Batch Processing vs Real Time Processing
Structured versus Unstructured vs Semi-Structured Data
Big vs Not-Big Data
Data Cleaning / Preparation - Outlier Analysis, Missing Values Imputation Techniques, Transformations, Normalization / Standardization, Discretization
Sampling techniques for handling Balanced vs. Imbalanced Datasets
What is the Sampling Funnel and its application and its components?
- Population
- Sampling frame
- Simple random sampling
- Sample
Measures of Central Tendency & Dispersion
- Population
- Mean/Average, Median, Mode
- Variance, Standard Deviation, Range

A Data scientist is the top ranking professional in any analytics organization. Glassdoor ranks Data Scientists first in the 25 Best Jobs for 2019. In today’s market, Data Scientists are scarce and in demand. As a Data Scientist, you are required to understand the business problem, design a data analysis strategy, collect and format the required data, apply algorithms or techniques using the correct tools, and make recommendations backed by data.

Data Visualization helps understand the patterns or anomalies in the data easily and learn about various graphical representations in this module. Understand the terms univariate and bivariate and the plots used to analyze in 2D dimensions. Understand how to derive conclusions on business problems using calculations performed on sample data. You will learn the concepts to deal with the variations that arise while analyzing different samples for the same population using the central limit theorem.

Gain an in-depth understanding of data structure and data manipulation

Understand and use linear and non-linear regression models and classification techniques for data analysis

Obtain an in-depth understanding of supervised and unsupervised learning models such as linear regression, logistic regression, clustering, dimensionality reduction, K-NN, and pipeline

Perform scientific and technical computing using the SciPy package and its sub-packages such as Integrate, Optimize, Statistics, IO, and Weave

Gain expertise in mathematical computing using the NumPy and Scikit-Learn packages

Understand the different components of the Hadoop ecosystem

Learn to work with HBase, its architecture, and data storage, learning the difference between HBase and RDBMS, and use Hive and Impala for partitioning

Understand MapReduce and its characteristics, plus learn how to ingest data using Sqoop and Flume

Master the concepts of recommendation engine and time series modeling and gain practical mastery over principles, algorithms, and applications of machine learning

Learn to analyze data using Tableau and become proficient in building interactive dashboards

Programming Languages,Tools & Packages

Data Science & AI Course Modules

Many modules are in great demand for the requirements in the present changing business. The black box is the most powerful technique used to validate against the external factors that are responsible for software issues. The supervised machine learning algorithms include Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, Support Vector systems, and many more. Deep learning is the lineage of Machine learning algorithms. Deep learning is mainly used in Computer vision, Bioinformatics, Audio recognition, and medical analyzing systems. Deep learning algorithms include Convolutional Neural Networks, Artificial Neural Networks, Multiple Linear Regression, Logistic regression, etc. Unsupervised learning in data mining includes Clustering, Neural networks, Principal component Analysis, Local outlier factor, and soon.

Data Science project management methodology, CRISP-DM will be explained in this module in finer detail. Learn about Data Collection, Data Cleansing, Data Preparation, Data Munging, Data Wrapping, etc. Learn about the preliminary steps taken to churn the data, known as exploratory data analysis. In this module, you also are introduced to statistical calculations which are used to derive information from data. We will begin to understand how to perform a descriptive analysis.

Machine Learning project management methodology
Data Collection - Surveys and Design of Experiments
Data Types namely Continuous, Discrete, Categorical, Count, Qualitative, Quantitative and its identification and application
Further classification of data in terms of Nominal, Ordinal, Interval & Ratio types
Balanced versus Imbalanced datasets
Cross Sectional versus Time Series vs Panel / Longitudinal Data
Batch Processing vs Real Time Processing
Structured versus Unstructured vs Semi-Structured Data
Big vs Not-Big Data
Data Cleaning / Preparation - Outlier Analysis, Missing Values Imputation Techniques, Transformations, Normalization / Standardization, Discretization
Sampling techniques for handling Balanced vs. Imbalanced Datasets
What is the Sampling Funnel and its application and its components?
- Population
- Sampling frame
- Simple random sampling
- Sample
Measures of Central Tendency & Dispersion
- Population
- Mean/Average, Median, Mode
- Variance, Standard Deviation, Range

Learn about various statistical calculations used to capture business moments for enabling decision makers to make data driven decisions. You will learn about the distribution of the data and its shape using these calculations. Understand to intercept information by representing data by visuals. Also learn about Univariate analysis, Bivariate analysis and Multivariate analysis.

Measure of Skewness
Measure of Kurtosis
Spread of the Data
Various graphical techniques to understand data
- Bar Plot
- Histogram
- Boxplot
- Scatter Plot

Line Chart
Pair Plot
Sample Statistics
Population Parameters
Inferential Statistics

In this tutorial you will learn in detail about continuous probability distribution. Understand the properties of a continuous random variable and its distribution under normal conditions. To identify the properties of a continuous random variable, statisticians have defined a variable as a standard, learning the properties of the standard variable and its distribution. You will learn to check if a continuous random variable is following normal distribution using a normal Q-Q plot. Learn the science behind the estimation of value for a population using sample data.

Random Variable and its definition
Probability & Probability Distribution
- Continuous Probability Distribution / Probability Density Function
- Discrete Probability Distribution / Probability Mass Function
Normal Distribution
Standard Normal Distribution / Z distribution
Z scores and the Z table
QQ Plot / Quantile - Quantile plot
Sampling Variation
Central Limit Theorem
Sample size calculator
Confidence interval - concept
Confidence interval with sigma
T-distribution / Student's-t distribution
Confidence interval
- Population parameter with Standard deviation known
- Population parameter with Standard deviation not known
A complete recap of Statistics

Learn to frame business statements by making assumptions. Understand how to perform testing of these assumptions to make decisions for business problems. Learn about different types of Hypothesis testing and its statistics. You will learn the different conditions of the Hypothesis table, namely Null Hypothesis, Alternative hypothesis, Type I error and Type II error. The prerequisites for conducting a Hypothesis test, interpretation of the results will be discussed in this module.

Formulating a Hypothesis
Choosing Null and Alternative Hypothesis
Type I or Alpha Error and Type II or Beta Error
Confidence Level, Significance Level, Power of Test
Comparative study of sample proportions using Hypothesis testing
2 Sample t-test
ANOVA
2 Proportion test
Chi-Square test

Learn about insights on how data is assisting organizations to make informed data-driven decisions. Data is treated as the new oil for all the industries and sectors which keep organizations ahead in the competition. Learn the application of Big Data Analytics in real-time, you will understand the need for analytics with a use case. Also, learn about the best project management methodology for Data Mining - CRISP-DM at a high level.

All About Data2bussinessinsights.in
Dos and Don'ts as a participant
Introduction to Big Data Analytics
Data and its uses – a case study (Grocery store)
Interactive marketing using data & IoT – A case study
Course outline, road map, and takeaways from the course
Stages of Analytics - Descriptive, Predictive, Prescriptive, etc.
Cross-Industry Standard Process for Data Mining

Data Mining supervised learning is all about making predictions for an unknown dependent variable using mathematical equations explaining the relationship with independent variables. Revisit the school math with the equation of a straight line. Learn about the components of Linear Regression with the equation of the regression line. Get introduced to Linear Regression analysis with a use case for prediction of a continuous dependent variable. Understand about ordinary least squares technique.

Scatter diagram
- Correlation analysis
- Correlation coefficient
Ordinary least squares
Principles of regression
Simple Linear Regression
Exponential Regression, Logarithmic Regression, Quadratic or Polynomial Regression
Confidence Interval versus Prediction Interval
Heteroscedasticity / Equal Variance

In the continuation to Regression analysis study you will learn how to deal with multiple independent variables affecting the dependent variable. Learn about the conditions and assumptions to perform linear regression analysis and the workarounds used to follow the conditions. Understand the steps required to perform the evaluation of the model and to improvise the prediction accuracies. You will be introduced to concepts of variance and bias.

LINE assumption
- Linearity
- Independence
- Normality
- Equal Variance / Homoscedasticity
Collinearity (Variance Inflation Factor)
Multiple Linear Regression
Model Quality metrics
Deletion Diagnostics

Learn about overfitting and underfitting conditions for prediction models developed. We need to strike the right balance between overfitting and underfitting, learn about regularization techniques L1 norm and L2 norm used to reduce these abnormal conditions. The regression techniques Lasso and Ridge techniques are discussed in this module .

Understanding Overfitting (Variance) vs. Underfitting (Bias)
Generalization error and Regularization techniques
Different Error functions or Loss functions or Cost functions
Lasso Regression
Ridge Regression

You have learnt about predicting a continuous dependent variable. As part of this module, you will continue to learn Regression techniques applied to predict attribute Data. Learn about the principles of the logistic regression model, understand the sigmoid curve, the usage of cutoff value to interpret the probable outcome of the logistic regression model. Learn about the confusion matrix and its parameters to evaluate the outcome of the prediction model. Also, learn about maximum likelihood estimation.

Principles of Logistic regression
Types of Logistic regression
Assumption & Steps in Logistic regression
Analysis of Simple logistic regression results
Multiple Logistic regression
Confusion matrix
- False Positive, False Negative
- True Positive, True Negative
- Sensitivity, Recall, Specificity, F1
Receiver operating characteristics curve (ROC curve)
Precision Recall (P-R) curve
Lift charts and Gain charts

Extension to logistic regression We have a multinomial regression technique used to predict a multiple categorical outcome. Understand the concept of multi logit equations, baseline and making classifications using probability outcomes. Learn about handling multiple categories in output variables including nominal as well as ordinal data.

Logit and Log-Likelihood
Category Baselining
Modeling Nominal categorical data
Handling Ordinal Categorical Data
Interpreting the results of coefficient values

As part of this module you learn further different regression techniques used for predicting discrete data. These regression techniques are used to analyze the numeric data known as count data. Based on the discrete probability distributions namely Poisson, negative binomial distribution the regression models try to fit the data to these distributions. Alternatively, when excessive zeros exist in the dependent variable, zero-inflated models are preferred, you will learn the types of zero-inflated models used to fit excessive zeros data.

Poisson Regression
Poisson Regression with Offset
Negative Binomial Regression
Treatment of data with Excessive Zeros
- Zero-inflated Poisson
- Zero-inflated Negative Binomial
- Hurdle Model

k Nearest Neighbor algorithm is distance based machine learning algorithm. Learn to classify the dependent variable using the appropriate k value. The k-NN classifier also known as lazy learner is a very popular algorithm and one of the easiest for application.

Deciding the K value
Thumb rule in choosing the K value
Building a KNN model by splitting the data
Checking for Underfitting and Overfitting in KNN
Generalization and Regulation Techniques to avoid overfitting in KNN

Decision Tree & Random forest are some of the most powerful classifier algorithms based on classification rules. In this tutorial, you will learn about deriving the rules for classifying the dependent variable by constructing the best tree using statistical measures to capture the information from each of the attributes. Random forest is an ensemble technique constructed using multiple Decision trees and the final outcome is drawn from the aggregating the results obtained from these combinations of trees.

Elements of classification tree - Root node, Child Node, Leaf Node, etc.
Greedy algorithm
Measure of Entropy
Attribute selection using Information gain
Ensemble techniques - Stacking, Boosting and Bagging
Decision Tree C5.0 and understanding various arguments
Checking for Underfitting and Overfitting in Decision Tree
Generalization and Regulation Techniques to avoid overfitting in Decision Tree
Random Forest and understanding various arguments
Checking for Underfitting and Overfitting in Random Forest
Generalization and Regulation Techniques to avoid overfitting in Random Forest

Learn about improving reliability and accuracy of decision tree models using ensemble techniques. Bagging and Boosting are the go to techniques in ensemble techniques. The parallel and sequential approaches taken in Bagging and Boosting methods are discussed in this module.

Overfitting
Underfitting
Pruning
Boosting
Bagging or Bootstrap aggregating

The Boosting algorithms AdaBoost and Extreme Gradient Boosting are discussed as part of this continuation module You will also learn about stacking methods. Learn about these algorithms which are providing unprecedented accuracy and helping many aspiring data scientists win the first place in various competitions such as Kaggle, CrowdAnalytix, etc.

AdaBoost / Adaptive Boosting Algorithm
Checking for Underfitting and Overfitting in AdaBoost
Generalization and Regulation Techniques to avoid overfitting in AdaBoost
Gradient Boosting Algorithm<
Checking for Underfitting and Overfitting in Gradient Boosting
Generalization and Regulation Techniques to avoid overfitting in Gradient Boosting
Extreme Gradient Boosting (XGB) Algorithm
Checking for Underfitting and Overfitting in XGB
Generalization and Regulation Techniques to avoid overfitting in XGB

Learn to analyse the unstructured textual data to derive meaningful insights. Understand the language quirks to perform data cleansing, extract features using a bag of words and construct the key-value pair matrix called DTM. Learn to understand the sentiment of customers from their feedback to take appropriate actions. Advanced concepts of text mining will also be discussed which help to interpret the context of the raw text data. Topic models using LDA algorithm, emotion mining using lexicons are discussed as part of NLP module.

Sources of data
Bag of words
Pre-processing, corpus Document Term Matrix (DTM) & TDM
Word Clouds
Corpus level word clouds
- Sentiment Analysis
- Positive Word clouds
- Negative word clouds
- Unigram, Bigram, Trigram
Semantic network
Clustering
Extract user reviews of the product/services from Amazon, Snapdeal and trip advisor
Install Libraries from Shell
Extraction and text analytics in Python
LDA / Latent Dirichlet Allocation
Topic Modelling
Sentiment Extraction
Lexicons & Emotion Mining

Revise Bayes theorem to develop a classification technique for Machine learning. In this tutorial you will learn about joint probability and its applications. Learn how to predict whether an incoming email is a spam or a ham email. Learn about Bayesian probability and the applications in solving complex business problems.

Probability – Recap
Bayes Rule
Naïve Bayes Classifier
Text Classification using Naive Bayes
Checking for Underfitting and Overfitting in Naive Bayes
Generalization and Regulation Techniques to avoid overfitting in Naive Bayes

Perceptron algorithm is defined based on a biological brain model. You will talk about the parameters used in the perceptron algorithm which is the foundation of developing much complex neural network models for AI applications. Understand the application of perceptron algorithms to classify binary data in a linearly separable scenario.

Neurons of a Biological Brain
Artificial Neuron
Perceptron
Perceptron Algorithm
Use case to classify a linearly separable data
Multilayer Perceptron to handle non-linear data

Neural Network is a black box technique used for deep learning models. Learn the logic of training and weights calculations using various parameters and their tuning. Understand the activation function and integration functions used in developing a neural network.

Integration functions
Activation functions
Weights
Bias
Learning Rate (eta) - Shrinking Learning Rate, Decay Parameters
Error functions - Entropy, Binary Cross Entropy, Categorical Cross Entropy, KL Divergence, etc.

Artificial Neural Networks
ANN Structure
Error Surface
Gradient Descent Algorithm
Backward Propagation
Network Topology
Principles of Gradient Descent (Manual Calculation)
Learning Rate (eta)
Batch Gradient Descent
Stochastic Gradient Descent
Minibatch Stochastic Gradient Descent
Optimization Methods: Adagrad, Adadelta, RMSprop, Adam
Convolution Neural Network (CNN)
- ImageNet Challenge – Winning Architectures
- Parameter Explosion with MLPs
- Convolution Networks
Recurrent Neural Network
- Language Models
- Traditional Language Model
- Disadvantages of MLP
- Back Propagation Through Time
- Long Short-Term Memory (LSTM)
- Gated Recurrent Network (GRU)

Support Vector Machines / Large-Margin / Max-Margin Classifier
Hyperplanes
Best Fit "boundary"
Linear Support Vector Machine using Maximum Margin
SVM for Noisy Data
Non- Linear Space Classification
Non-Linear Kernel Tricks
- Linear Kernel
- Polynomial
- Sigmoid
- Gaussian RBF
SVM for Multi-Class Classification
- One vs. All
- One vs. One
Directed Acyclic Graph (DAG) SVM

Data mining unsupervised techniques are used as EDA techniques to derive insights from the business data. In this first module of unsupervised learning, get introduced to clustering algorithms. Learn about different approaches for data segregation to create homogeneous groups of data. Hierarchical clustering, K means clustering are most commonly used clustering algorithms. Understand the different mathematical approaches to perform data segregation. Also learn about variations in K-means clustering like K-medoids, K-mode techniques, learn to handle large data sets using CLARA technique.

• Hierarchical • Supervised vs Unsupervised learning • Data Mining Process • Hierarchical Clustering / Agglomerative Clustering • Dendrogram • Measure of distance
- Numeric
  - Euclidean, Manhattan, Mahalanobis
- Categorical
  - Binary Euclidean
  - Simple Matching Coefficient
  - Jaquard's Coefficient
- Mixed
  - Gower's General Dissimilarity Coefficient
- Types of Linkages
  - Single Linkage / Nearest Neighbour
  - Complete Linkage / Farthest Neighbour
  - Average Linkage
  - Centroid Linkage
- K-Means Clustering
  - Measurement metrics of clustering
    - Within the Sum of Squares
    - Between the Sum of Squares
    - Total Sum of Squares
  - Choosing the ideal K value using Scree Plot / Elbow Curve
  - Other Clustering Techniques
    - K-Medians
    - K-Medoids
    - K-Modes
    - Clustering Large Application (CLARA)
    - Partitioning Around Medoids (PAM)
    - Density-based spatial clustering of applications with noise (DBSCAN)

Dimension Reduction (PCA) / Factor Analysis Description: Learn to handle high dimensional data. The performance will be hit when the data has a high number of dimensions and machine learning techniques training becomes very complex, as part of this module you will learn to apply data reduction techniques without any variable deletion. Learn the advantages of dimensional reduction techniques. Also, learn about yet another technique called Factor Analysis.

Why Dimension Reduction
Advantages of PCA
Calculation of PCA weights
2D Visualization using Principal components
Basics of Matrix Algebra
Factor Analysis

Learn to measure the relationship between entities. Bundle offers are defined based on this measure of dependency between products. Understand the metrics Support, Confidence and Lift used to define the rules with the help of Apriori algorithm. Learn pros and cons of each of the metrics used in Association rules.

What is Market Basket / Affinity Analysis
Measure of Association
- Support
- Confidence
- Lift Ratio
Apriori Algorithm
Sequential Pattern Mining

Personalized recommendations made in e-commerce are based on all the previous transactions made. Learn the science of making these recommendations using measuring similarity between customers. The various methods applied for collaborative filtering, their pros and cons, SVD method used for recommendations of movies by Netflix will be discussed as part of this module.

User-based Collaborative Filtering
A measure of distance/similarity between users
Driver for Recommendation
Computation Reduction Techniques
Search based methods/Item to Item Collaborative Filtering
SVD in recommendation
The vulnerability of recommendation systems

Study of a network with quantifiable values is known as network analytics. The vertex and edge are the node and connection of a network, learn about the statistics used to calculate the value of each node in the network. You will also learn about the google page ranking algorithm as part of this module.

Definition of a network (the LinkedIn analogy)
The measure of Node strength in a Network
- Degree centrality
- Closeness centrality
- Eigenvector centrality
- Adjacency matrix
- Betweenness centrality
- Cluster coefficient
Introduction to Google page ranking

AutoML Methods
AutoML Systems
AutoML on Cloud - AWS
- Amazon SageMaker
- Sagaemaker Notebook Instance for Model Development, Training and
- Deployment
- XG Boost Classification Model
- Hyperparameter tuning jobs
AutoML on Cloud - Azure
- Workspace
- Environment
- Compute Instance
- Automatic Featurization
- AutoML and ONNX
AutoML on Cloud - GCP
- AutoML Natural Language Performing Document Classification
- Performing Sentiment Analysis using AutoML Natural Language API
- Cloud ML Engine and Its Components
- Training and Deploying Applications on Cloud ML Engine
- Choosing Right Cloud ML Engine for Training Jobs

Kaplan Meier method and life tables are used to estimate the time before the event occurs. Survival analysis is about analyzing this duration or time before the event. Real-time applications of survival analysis in customer churn, medical sciences and other sectors is discussed as part of this module. Learn how survival analysis techniques can be used to understand the effect of the features on the event using Kaplan Meier survival plot.

Examples of Survival Analysis
Time to event
Censoring
Survival, Hazard, Cumulative Hazard Functions
Introduction to Parametric and non-parametric functions

Time series analysis is performed on the data which is collected with respect to time. The response variable is affected by time. Understand the time series components, Level, Trend, Seasonality, Noise and methods to identify them in a time series data. The different forecasting methods available to handle the estimation of the response variable based on the condition of whether the past is equal to the future or not will be introduced in this module. In this first module of forecasting, you will learn the application of Model-based forecasting techniques.

Introduction to time series data
Steps to forecasting
Components to time series data
Scatter plot and Time Plot
Lag Plot
ACF - Auto-Correlation Function / Correlogram
Visualization principles
Naïve forecast methods
Errors in the forecast and it metrics - ME, MAD, MSE, RMSE, MPE, MAPE
Model-Based approaches
- Linear Model
- Exponential Model
- Quadratic Model
- Additive Seasonality
- Multiplicative Seasonality
Model-Based approaches Continued
AR (Auto-Regressive) model for errors
Random walk

In this continuation module of forecasting learn about data-driven forecasting techniques. Learn about ARMA and ARIMA models which combine model-based and data-driven techniques. Understand the smoothing techniques and variations of these techniques. Get introduced to the concept of de-trending and deseasonalize the data to make it stationary. You will learn about seasonal index calculations which are used for reseasonalize the result obtained by smoothing models.

ARMA (Auto-Regressive Moving Average), Order p and q
ARIMA (Auto-Regressive Integrated Moving Average), Order p, d, and q
A data-driven approach to forecasting
Smoothing techniques
- Moving Average
- Exponential Smoothing
- Holt's / Double Exponential Smoothing
- Winters / Holt-Winters
De-seasoning and de-trending
Econometric Models
Forecasting using Python
Forecasting using R

This course will be the first stepping stone towards Artificial Intelligence and Deep Learning. In this module, you will be introduced to the analytics programming languages. R is a statistical programming language and Python is a general-purpose programming language. These are the most popular tools currently being employed to churn data for deriving meaningful insights.

All About Data2bussinessinsights.in
Dos and Don'ts as a Participant
Introduction to Artificial intelligence and Deep learning
Course Outline, Road Map and Takeaways from the Course
Cross-Industry Standard Process for Data Mining
Artificial Intelligence Applications

Different packages can be used to build Deep Learning and Artificial Intelligence models, such as Tensorflow, Keras, OpenCV, and PyTorch. You will learn more about these packages and their applications in detail.

Tensorflow and Keras libraries can be used to build Machine Learning and Deep Learning models. OpenCV is used for image processing and PyTorch is highly useful when you have no idea how much memory will be required for creating a Neural Network Model.

Introduction to Deep Learning libraries – Torch, Theono, Caffe, Tensorflow, Keras, OpenCV and PyTorch
Deep dive into Tensorflow, Keras, OpenCV and PyTorch
Introduction to Anaconda, R, R studio, Jupyter and Spyder
Environment Setup and Installation Methods of Multiple Packages

Understand the types of Machine Learning Algorithms. Learn about the life cycle and the detailed understanding of each step involved in the project life cycle. The CRISP-DM process is applied in general for Data Analytics /AI projects. Learn about CRISP-DM and the stages of the project life cycle in-depth.

You will also learn different types of data, Data Collection, Data Preparation, Data Cleansing, Feature Engineering, EDA, Data Mining and various Error Functions. Understand about imbalanced data handling techniques and algorithms.

Introduction to Machine Learning
Machine Learning and its types - Supervised Learning, Unsupervised Learning, Reinforcement Learning, Semi-supervised Learning, Active Learning, Transfer Learning, Structured Prediction
Understand Business Problem – Business Objective & Business Constraints
Data Collection - Surveys and Design of Experiments
Data Types namely Continuous, Discrete, Categorical, Count, Qualitative, Quantitative and its identification and application
Further classification of data in terms of Nominal, Ordinal, Interval & Ratio types
Balanced vs Imbalanced datasets
Cross-Sectional vs Time Series versus Panel / Longitudinal Data
Batch Processing versus Real-Time Processing
Structured vs Unstructured vs Semi-Structured Data
Big vs Not-Big Data
Data Cleaning / Preparation - Outlier Analysis, Missing Values Imputation Techniques, Transformations, Normalization / Standardization, Discretization
Sampling Techniques for Handling Balanced vs Imbalanced Datasets
Measures of Central Tendency & Dispersion
- Mean/Average, Median, Mode
- Variance, Standard Deviation, Range
Various Graphical Techniques to Understand Data
- Bar Plot
- Histogram
- Boxplot
- Scatter Plot
Feature Engineering - Feature Extraction & Feature Selection
Error Functions - Y is Continuous - Mean Error, Mean Absolute Deviation, Mean Squared Error, Mean Percentage Error, Root Mean Squared Error, Mean Absolute Percentage Error
Error Functions - Y is Discrete - Cross Table, Confusion Matrix, Binary Cross Entropy & Categorical Cross Entropy
Machine Learning Projects Strategy

Maximize or minimize the error rate using Calculus. Learn to find the best fit line using the linear least-squares method. Understand the gradient method to find the minimum value of a function where a closed-form of the solution is not available or not easily obtained.

Under Linear Algebra, you will learn sets, function, scalar, vector, matrix, tensor, basic operations and different matrix operations. Under Probability one will learn about Uniform Distribution, Normal Distribution, Binomial Distribution, Discrete Random Variable, Cumulative Distribution Function and Continuous Random Variables.

Optimizations - Applications
Foundations - Slope, Derivatives & Tangent
Derivatives in Optimization
Maxima & Minima - First Derivative Test, Second Derivative Test, Partial Derivatives, Cross Partial Derivatives, Saddle Point, Determinants, Minor and Cofactor
Linear Regression Ordinary Least Squares using Calculus

You will have a high level understanding of the human brain, importance of multiple layers in the Neural Network, extraction of features layers wise, composition of the data in Deep Learning using an image, speech and text.

You will briefly understand feature extraction using SIFT/HOG for images, Speech recognition and feature extraction using MFCC and NLP feature extraction using parse tree syntactic.

Introduction to neurons, which are connected to weighted inputs, threshold values, and an output. You will understand the importance of weights, bias, summation and activation functions.

Human Brain – Introduction to Biological & Artificial Neuron
Compositionality in Data – Images, Speech & text
Mathematical Notations
Introduction to ANN
Neuron, Weights, Activation function, Integration function, Bias and Output

Learn about single-layered Perceptrons, Rosenblatt’s perceptron for weights and bias updation. You will understand the importance of learning rate and error. Walk through a toy example to understand the perceptron algorithm. Learn about the quadratic and spherical summation functions. Weights updating methods - Windrow-Hoff Learning Rule & Rosenblatt’s Perceptron.

Introduction to Perceptron
Introduction to Multi-Layered Perceptron (MLP)
Activation functions – Identity Function, Step Function, Ramp Function, Sigmoid Function, Tanh Function, ReLU, ELU, Leaky ReLU & Maxout
Back Propagation Visual Demonstration
Network Topology – Key characteristics and Number of layers
Weights Calculation in Back Propagation

Understand the difference between perception and MLP or ANN. Learn about error surface, challenges related to gradient descent and the practical issues related to deep learning. You will learn the implementation of MLP on MNIST dataset - multi class problem, IMDB dataset - binary classification problem, Reuters dataset - single labelled multi class classification problem and Boston Housing dataset - Regression Problem using Python and Keras.

Error Surface – Learning Rate & Random Weight Initialization
Local Minima issues in Gradient Descent Learning
Is DL a Holy Grail? Pros and Cons
Practical Implementation of MLP/ANN in Python using Real Life Use Cases
Segregation of Dataset - Train, Test & Validation
Data Representation in Graphs using Matplotlib
Deep Learning Challenges – Gradient Primer, Activation Function, Error Function, Vanishing Gradient, Error Surface challenges, Learning Rate challenges, Decay Parameter, Gradient Descent Algorithmic Approaches, Momentum, Nestrov Momentum, Adam, Adagrad, Adadelta & RMSprop
Deep Learning Practical Issues – Avoid Overfitting, DropOut, DropConnect, Noise, Data Augmentation, Parameter Choices, Weights Initialization (Xavier, etc.)

Convolution Neural Networks are the class of Deep Learning networks which are mostly applied on images. You will learn about ImageNet challenge, overview on ImageNet winning architectures, applications of CNN, problems of MLP with huge dataset.

You will understand convolution of filter on images, basic structure on convent, details about Convolution layer, Pooling layer, Fully Connected layer, Case study of AlexNet and few of the practical issues of CNN.

ImageNet Challenge – Winning Architectures, Difficult Vision Problems & Hierarchical Approach
Parameter Explosion with MLPs
Convolution Networks - 1D ConvNet, 2D ConvNet, Transposed Convolution
Convolution Layers with Filters and Visualizing Convolution Layers
Pooling Layer, Padding, Stride
Transfer Learning - VGG16, VGG19, Resnet, GoogleNet, LeNet, etc.
Practical Issues – Weight decay, Drop Connect, Data Manipulation Techniques & Batch Normalization

You will learn image processing techniques, noise reduction using moving average methods, different types of filters - smoothing the image by averaging, Gaussian filter and the disadvantages of correlation filters. You will learn about different types of filters, boundary effects, template matching, rate of change in the intensity detection, different types of noise, image sampling and interpolation techniques.

You will also learn about colors and intensity, affine transformation, projective transformation, embossing, erosion & dilation, vignette, histogram equalization, HAAR cascade for object detection, SIFT, SURF, FAST, BRIEF and seam carving.

Introduction to Vision
Importance of Image Processing
Image Processing Challenges – Interclass Variation, ViewPoint Variation, Illumination, Background Clutter, Occlusion & Number of Large Categories
Introduction to Image – Image Transformation, Image Processing Operations & Simple Point Operations
Noise Reduction – Moving Average & 2D Moving Average
Image Filtering – Linear & Gaussian Filtering
Disadvantage of Correlation Filter
Introduction to Convolution
Boundary Effects – Zero, Wrap, Clamp & Mirror
Image Sharpening
Template Matching
Edge Detection – Image filtering, Origin of Edges, Edges in images as Functions, Sobel Edge Detector
Effect of Noise
Laplacian Filter
Smoothing with Gaussian
LOG Filter – Blob Detection
Noise – Reduction using Salt & Pepper Noise using Gaussian Filter
Nonlinear Filters
Bilateral Filters
Canny Edge Detector - Non Maximum Suppression, Hysteresis Thresholding
Image Sampling & Interpolation – Image Sub Sampling, Image Aliasing, Nyquist Limit, Wagon Wheel Effect, Down Sampling with Gaussian Filter, Image Pyramid, Image Up Sampling
Image Interpolation – Nearest Neighbour Interpolation, Linear Interpolation, Bilinear Interpolation & Cubic Interpolation
Introduction to the dnn module
- Deep Learning Deployment Toolkit
- Use of DLDT with OpenCV4.0
OpenVINO Toolkit
- Introduction
- Model Optimization of pre-trained models
- Inference Engine and Deployment process

Understand the language models for next word prediction, spell check, mobile auto-correct, speech recognition, and machine translation. You will learn the disadvantages of traditional models and MLP. Deep understanding of the architecture of RNN, RNN language model, backpropagation through time, types of RNN - one to one, one to many, many to one and many to many along with different examples for each type.

Introduction to Adversaries
Language Models – Next Word Prediction, Spell Checkers, Mobile Auto-Correction, Speech Recognition & Machine Translation
Traditional Language model
Disadvantages of MLP
Introduction to State & RNN cell
Introduction to RNN
RNN language Models
Back Propagation Through time
RNN Loss Computation
Types of RNN – One to One, One to Many, Many to One, Many to Many
Introduction to the CNN and RNN
Combining CNN and RNN for Image Captioning
Architecture of CNN and RNN for Image Captioning
Bidirectional RNN
Deep Bidirectional RNN
Disadvantages of RNN

You will learn to build an object detection model using Fast R-CNN by using bounding boxes, understand why fast RCNN is a better choice while dealing with object detection. You will also learn by instance segmentation problems which can be avoided using Mask RCNN.

CNN-RNN Variants
R-CNN
Fast R-CNN
Faster R-CNN
Mask R-CNN

Understand and implement Long Short-Term Memory, which is used to keep the information intact, unless the input makes them forget. You will also learn the components of LSTM - cell state, forget gate, input gate and the output gate along with the steps to process the information. Learn the difference between RNN and LSTM, Deep RNN and Deep LSTM and different terminologies. You will apply LSTM to build models for prediction.

Introduction to LSTM – Architecture
Importance of Cell State, Input Gate, Output Gate, Forget Gate, Sigmoid and Tanh
Mathematical Calculations to Process Data in LSTM
RNN vs LSTM - Bidirectional vs Deep Bidirectional RNN
Deep RNN vs Deep LSTM

Gated Recurrent Unit, a variant of LSTM solves this problem in RNN. You will learn the components of GRU and the steps to process the information.

Introduction to GRU
Architecture & Gates - Update Gate, Reset Gate, Current Memory Content, Final Memory at Current Timestep
Applications of GRUs

You will learn about the components of Autoencoders, steps used to train the autoencoders to generate spatial vectors, types of autoencoders and generation of data using variational autoencoders. Understanding the architecture of RBM and the process involved in it.

Autoencoders
- Intuition
- Comparison with other Encoders (MP3 and JPEG)
- Implementation in Keras
Deep AutoEncoders
- Intuition
- Implementing DAE in Keras
Convolutional Autoencoders
- Intuition
- Implementation in Keras
Variational Autoencoders
- IntuitionImplementation in Keras
Introduction to Restricted Boltzmann Machines - Energy Function, Schematic implementation, Implementation in TensorFlow

You will learn the difference between CNN and DBN, architecture of deep belief networks, how greedy learning algorithms are used for training them and applications of DBN.

Introduction to DBN
Architecture of DBN
Applications of DBN
DBN in Real World

Understanding the generation of data using GAN, the architecture of the GAN - encoder and decoder, loss calculation and backpropagation, advantages and disadvantages of GAN.

Introduction to Generative Adversarial Networks (GANS)
Data Analysis and Pre-Processing
Building Model
Model Inputs and Hyperparameters
Model losses
Implementation of GANs
Defining the Generator and Discriminator
Generator Samples from Training
Model Optimizer
Discriminator and Generator Losses
Sampling from the Generator
Advanced Applications of GANS
- Pix2pixHD
- CycleGAN
- StackGAN++ (Generation of photo-realistic images)
- GANs for 3D data synthesis
- Speech quality enhancement with SEGAN

You will learn to use SRGAN which uses the GAN to produce the high-resolution images from the low-resolution images. Understand about generators and discriminators.

Introduction to SRGAN
Network Architecture - Generator, Discriminator
Loss Function - Discriminator Loss & Generator Loss
Implementation of SRGAN in Keras

You will learn Q-learning which is a type of reinforcement learning, exploiting using the creation of a Q table, randomly selecting an action using exploring and steps involved in learning a task by itself.

Reinforcement Learning
Deep Reinforcement Learning vs Atari Games
Maximizing Future Rewards
Policy vs Values Learning
Balancing Exploration With Exploitation
Experience Replay, or the Value of Experience
Q-Learning and Deep Q-Network as a Q-Function
Improving and Moving Beyond DQN
Keras Deep Q-Network

Learn to Build a speech to text and text to speech models. You will understand the steps to extract the structured speech data from a speech, convert that into text. Later use the unstructured text data to convert into speech.

Speech Recognition Pipeline
Phonemes
Pre-Processing
Acoustic Model
Deep Learning Models
Decoding

Learn to Build a chatbot using generative models and retrieval models. We will understand RASA open-source and LSTM to build chatbots.

Introduction to Chatbot
NLP Implementation in Chatbot
Integrating and implementing Neural Networks Chatbot
Introduction to Sequence to Sequence models and Attention
- Transformers and it applications
- Transformers language models
  - BERT
  - Transformer-XL (pretrained model: “transfo-xl-wt103”)
  - XLNet
Building a Retrieval Based Chatbot
Deploying Chatbot in Various Platforms

Learn the tools which automatically analyzes your data and generates candidate model pipelines customized for your predictive modeling problem.

AutoML Methods
- Meta-Learning
- Hyperparameter Optimization
- Neural Architecture Search
- Network Architecture Search
AutoML Systems
- MLBox
- Auto-Net 1.0 & 2.0
- Hyperas
AutoML on Cloud - AWS
- Amazon SageMaker
- Sagemaker Notebook Instance for Model Development, Training and Deployment
- XG Boost Classification Model
- Training Jobs
- Hyperparameter Tuning Jobs
AutoML on Cloud - Azure
- Workspace
- Environment
- Compute Instance
- Compute Targets
- Automatic Featurization
- AutoML and ONNX
AutoML on Cloud - GCP
- AutoML Natural Language
- Performing Document Classification
- AutoML Version API's for Image Classification
- Performing Sentiment Analysis using AutoML Natural Language API
- Tensor-Flow Models Using Cloud ML Engine
- Cloud ML Engine and Its Components
- Training and Deploying Applications on Cloud ML Engine
- Choosing Right Cloud ML Engine for Training Jobs

Learn the methods and techniques which can explain the results and the solutions obtained by using deep learning algorithms.

Introduction to XAI - Explainable Artificial Intelligence
Why do we need it?
Levels of Explainability
- Direct Explainability
  - Simulatability
  - Decomposability
  - Algorithmic Transparency
- Post-hoc Explainability
  - Model-Agnostic Algorithms
    - Explanation by simplification (Local Interpretable Model-Agnostic Explanations (LIME))
    - Feature relevance explanation
      - SHAP
      - QII
      - SA
      - ASTRID
      - XAI
    - Visual Explanations
General AI vs Symbolic Al vs Deep Learning

Course	Course completion certificate	Criteria
Data Science and Statistics Fundamentals	Required	85% of Online Self-paced completion
Data Science with R	Required	85% of Online Self-paced completion or attendance of 1 Live Virtual Classroom, and score above 75% in the course-end assessment, and successful evaluation in at least 1 project
Data Science with SAS	Required	85% of Online Self-paced completion or attendance of 1 Live Virtual Classroom, and score above 75% in the course-end assessment, and successful evaluation in at least 1 project
Data Science with Python	Required	85% of Online Self-paced completion or attendance of 1 Live Virtual Classroom, and score above 75% in course-end assessment and successful evaluation in at least 1 project
Machine Learning and Tableau	Required	85% of Online Self-paced completion or attendance of 1 Live Virtual Classroom, and successful evaluation in at least 1 project
Big Data Hadoop and Spark Developer	Required	85% of Online Self-paced completion or attendance of 1 Live Virtual Classroom, and score above 75% in the course-end assessment, and successful evaluation of at least 1 project
Capstone Project	Required	Attendance of 1 Live Virtual Classroom and successful completion of the capstone project

Data Science Certification

UpComing Batches

UpComing Batches