• #### GAN - Generative Models

W3.CSS 1 / 80

1. What does a neuron compute?

2 / 80

2.  Consider the following neural network which takes two binary-valued inputs
and outputs . Which of the following logical functions does it (approximately) compute? 3 / 80

3. Suppose img is a (32,32,3) array, representing a 32x32 image with 3 color channels red, green and blue. How do you reshape this into a column vector?

4 / 80

4. Consider the two following random arrays "a" and "b":

a = np.random.randn(2, 3) # a.shape = (2, 3)

b = np.random.randn(2, 1) # b.shape = (2, 1)

c = a + b

What will be the shape of "c"?

5 / 80

5. Consider the two following random arrays "a" and "b":

a = np.random.randn(4, 3) # a.shape = (4, 3)

b = np.random.randn(3, 2) # b.shape = (3, 2)

c = a*b

What will be the shape of "c"?

6 / 80

6. Suppose you have  input features per example. Recall that . What is the dimension of X?

7 / 80

7. Recall that "np.dot(a,b)" performs a matrix multiplication on a and b, whereas "a*b" performs an element-wise multiplication.
Consider the two following random arrays "a" and "b":

a = np.random.randn(12288, 150) # a.shape = (12288, 150)

b = np.random.randn(150, 45) # b.shape = (150, 45)

c = np.dot(a,b)

What is the shape of c?

8 / 80

8. Consider the following code snippet:

# a.shape = (3,4)

# b.shape = (4,1)

for i in range(3):

for j in range(4):

c[i][j] = a[i][j] + b[j]

How do you vectorize this?

9 / 80

9. Consider the following code:

a = np.random.randn(3, 3)

b = np.random.randn(3, 1)

c = a*b

What will be c? (If you’re not sure, feel free to run this in python to find out).

10 / 80

10. Consider the following computation graph. What is the output J?

11 / 80

11. What is the objective of the backpropagation algorithm?

12 / 80

12. The backpropagation law is also known as generalized delta rule, is it true?

13 / 80

13.  What is true regarding the backpropagation rule?

14 / 80

14. What is meant by generalized in the statement “backpropagation is a generalized delta rule”?

15 / 80

15. What are general limitations of back propagation rule?

16 / 80

16. What are the general tasks that are performed with the backpropagation algorithm?

17 / 80

17. Does back propagation learning is based on gradient descent along error surface?

18 / 80

18. How can the learning process be stopped in the backpropagation rule?

19 / 80

19. Which is the simplest pattern recognition task in a feedback network?

20 / 80

20. In a linear auto-associative network, if the input is noisy then the output will be noisy?

21 / 80

21. Backpropagation works by first calculating the gradient of _ and then propagating it backward.

22 / 80

22. Gradient descent optimizes best when you use an even number

23 / 80

23. When are stable states reached in energy landscapes, that can be used to store input patterns?

24 / 80

24. The number of patterns that can be stored in a given network depends on?

25 / 80

25. What happens when number of available energy minima be less than number of patterns to be stored?

26 / 80

26. What happens when the number of available energy minima be more than the number of patterns to be stored?

27 / 80

27. How hard problem can be solved?

28 / 80

28. Why there is an error in recall when the number of energy minima is more than the required number of patterns to be stored?

29 / 80

29. What is a Boltzman machine?

30 / 80

30. What is the objective of linear auto-associative feedforward networks?

31 / 80

31. A neural network model is said to be inspired by the human brain. The neural network consists of many neurons, each neuron takes an input, processes it, and gives an output. Here’s a diagrammatic representation of a real neuron. Which of the following statement(s) correctly represents a real neuron?

32 / 80

32. Below is a mathematical representation of a neuron The different components of the neuron are denoted as:

x1, x2,…, xN: These are inputs to the neuron. These can either be the actual observations from the input layer or an intermediate value from one of the hidden layers.

w1, w2,…,wN: The Weight of each input.

bi: Is termed as Bias units. These are constant values added to the input of the activation function corresponding to each weight. It works similar to an intercept term.

a:  Is termed as the activation of the neuron which can be represented as

and y: is the output of the neuron Considering the above notations, will a line equation (y = mx + c) fall into the category of a neuron?

33 / 80

33. Let us assume we implement an AND function to a single neuron. Below is a tabular representation of an AND function:
X1       X2        X1 AND X2
0          0                0
0          1                 0
1           0                0
The activation function of our neuron is denoted as:  What would be the weights and bias?

(Hint: For which values of w1, w2 and b do our neuron implement an AND function?)

34 / 80

34.  A network is created when multiple neurons stack together. Let us take an example of a neural network simulating an XNOR function. You can see that the last neuron takes input from two neurons before it. The activation function for all the neurons is given by: Suppose X1 is 0 and X2 is 1, what will be the output for the above neural network?

35 / 80

35. In a neural network, knowing the weight and bias of each neuron is the most important step. If you can somehow get the correct value of weight and bias for each neuron, you can approximate any function. What would be the best way to approach this?

36 / 80

36. What are the steps for using a gradient descent algorithm?

1. Calculate error between the actual value and the predicted value

2. Reiterate until you find the best weights of the network

3. Pass an input through the network and get values from the output layer

4. Initialize random weight and bias

5. Go to each neuron that contributes to the error and change its respective values to reduce the error

37 / 80

37. Suppose you have input as x, y, and z with values -2, 5, and -4 respectively. You have a neuron ‘q’ and neuron ‘f’ with functions:

q = x + y

f = q * z

Graphical representation of the functions is as follows What is the gradient of F with respect to x, y, and z?

(HINT: To calculate gradient, you must find (df/dx), (df/dy) and (df/dz))

38 / 80

38. Now let’s revise the previous slides. We have learned that:

A neural network is a (crude) mathematical representation of a brain, which consists of smaller components called neurons.

Each neuron has an input, a processing function, and an output.

These neurons are stacked together to form a network, which can be used to approximate any function.

Given above is a description of a neural network. When does a neural network model become a deep learning model?

39 / 80

39. A neural network can be considered as multiple simple equations stacked together. Suppose we want to replicate the function for the below-mentioned decision boundary. Using two simple inputs h1 and h2 What will be the final equation?

40 / 80

40. Convolutional Neural Networks can perform various types of transformation (rotations or scaling) in an input”.

Is the statement correct True or False?

41 / 80

41. Which of the following techniques perform similar operations as a dropout in a neural network?

42 / 80

42. Which of the following gives non-linearity to a neural network?

43 / 80

43. In training a neural network, you notice that the loss does not decrease in the few starting epochs The reasons for this could be:

1.The learning is rate is low

2.Regularization parameter is high

3.Stuck at local minima

What according to you are the probable reasons?

44 / 80

44. Which of the following is true about model capacity (where model capacity means the ability of the neural network to approximate complex functions)?

45 / 80

45.  If you increase the number of hidden layers in a Multi-Layer Perceptron, the classification error of test data always decreases. True or False?

46 / 80

46. You are building a neural network where it gets input from the previous layer as well as from itself. Which of the following architecture has feedback connections?

47 / 80

47. What is the sequence of the following tasks in a perceptron?

Initialize weights of perceptron randomly

1.Go to the next batch of the dataset

2.If the prediction does not match the output, change the weights

3.For a sample input, compute an output

48 / 80

48. Suppose that you have to minimize the cost function by changing the parameters. Which of the following technique could be used for this?

49 / 80

49. The below graph shows the accuracy of a trained 3-layer convolutional neural network vs the number of parameters (i.e. number of feature kernels).

50 / 80

50. Suppose we have one hidden layer neural network as shown above. The hidden layer in this network works as a dimensionality reduction. Now instead of using this hidden layer, we replace it with a dimensionality reduction technique such as PCA. Would the network that uses a dimensionality reduction technique always give the same output as a network with a hidden layer?

51 / 80

51. Can a neural network model the function (y=1/x)?

52 / 80

52.  In which neural net architecture, does weight sharing occur?

53 / 80

53. Batch Normalization is helpful because

54 / 80

54. Instead of trying to achieve absolute zero error, we set a metric called Bayes error which is the error we hope to achieve. What could be the reason for using Bayes error?

55 / 80

55. The number of neurons in the output layer should match the number of classes (Where the number of classes is greater than 2) in a supervised learning task. True or False?

56 / 80

56. In a neural network, which of the following techniques is used to deal with overfitting?

57 / 80

57. Y = ax^2 + bx + c (polynomial equation of degree 2)

Can this equation be represented by a neural network of the single hidden layer with a linear threshold?

58 / 80

58. What is a dead unit in a neural network?

59 / 80

59. Which of the following statement is the best description of early stopping?

60 / 80

60. What if we use a learning rate that’s too large?

61 / 80

61. The network shown in Figure 1 is trained to recognize the characters H and T as shown below: What would be the output of the network?

62 / 80

62. Suppose a convolutional neural network is trained on the ImageNet dataset (Object recognition dataset). This trained model is then given a completely white image as an input. The output probabilities for this input would be equal for all classes. True or False?

63 / 80

63. When the pooling layer is added to a convolutional neural network, translation invariance is preserved. True or False?

64 / 80

64. Which gradient technique is more advantageous when the data is too big to handle in RAM simultaneously?

65 / 80

65. The graph represents the gradient flow of a four-hidden layer neural network which is trained using sigmoid activation function per epoch of training. The neural network suffers from the vanishing gradient problem. Which of the following statements is true?

66 / 80

66.  For a classification task, instead of random weight initializations in a neural network, we set all the weights to zero. Which of the following statements is true?

67 / 80

67. There is a plateau at the start. This is happening because the neural network gets stuck at local minima before going on to global minima. To avoid this, which of the following strategy should work?

68 / 80

68. For an image recognition problem (recognizing a cat in a photo), which architecture of neural network would be better suited to solve the problem?

69 / 80

69. Which of the following is a decision boundary of Neural Network? 70 / 80

70.  In the graph below, we observe that the error has many “ups and downs” Should we be worried?

71 / 80

71. The table above shows the function of each layer in ANN. Which of them is the incorrect pair? 72 / 80

72. Which of the following are the Rule-based Al models?

73 / 80

73. The illustration shows the structure of an Artificial Neuron. What is the main usage of the Calculation component? 74 / 80

74. Which of the following are the disadvantages of ANN? (Select 2 options.)

75 / 80

75. Neural Network works by trial and error

76 / 80

76. Neural Network is type of...….. which is type of machine learning

77 / 80

77.  What is Artificial Neural Network (ANN) inspired by?

78 / 80

78. …........... is a series of algorithms that aims to understand the relationships in a set of data using a process inspired by the human brain

79 / 80

79. Check the answers that describe a real life application of neural network

80 / 80

80. Which of the following is usage of ANN?

The average score is 5%

0% 1 / 50

1. Which of the following methods do we use to find the best fit line for data in Linear Regression?

2 / 50

2. True False: Linear Regression is a supervised machine learning algorithm.

3 / 50

3. Which of the following evaluation metrics can be used to evaluate a model while modeling a continuous output variable?

4 / 50

4. True-False: Linear Regression is mainly used for Regression.

5 / 50

5. True-False: It is possible to design a Linear regression algorithm using a neural network?

6 / 50

6. Which of the following is true about Residuals?

7 / 50

7. True-False: Lasso Regularization can be used for variable selection in Linear Regression.

8 / 50

8. Suppose that we have N independent variables (X1,X2… Xn) and the dependent variable is Y. Now Imagine that you are applying linear regression by fitting the best fit line using least square error on this data.

You found that the correlation coefficient for one of its variables (Say X1) with Y is -0.95.

Which of the following is true for X1?

9 / 50

9. Looking at the above two characteristics, which of the following option is the correct for Pearson correlation between V1 and V2?

If you are given the two variables V1 and V2 and they are following below two characteristics.

1. If V1 increases then V2 also increases

2. If V1 decreases then V2 behavior is unknown

10 / 50

10. Suppose Pearson correlation between V1 and V2 is zero. In such a case, is it right to conclude that V1 and V2 do not have any relation between them?

11 / 50

11. Which of the following offsets, do we use in linear regression’s least-square line fit? Suppose the horizontal axis is the independent variable and the vertical axis is the dependent variable.

12 / 50

12. True- False: Overfitting is more likely when you have a huge amount of data to train?

13 / 50

13. We can also compute the coefficient of linear regression with the help of an analytical method called “Normal Equation”. Which of the following is/are true about Normal Equation?

1.We don’t have to choose the learning rate

2. It becomes slow when the number of features is very large

3.Thers is no need to iterate

14 / 50

14. Which of the following statement is true about sum of residuals of A and B?

The below graphs show two fitted regression lines (A & B) on randomly generated data. Now, I want to find the sum of residuals in both cases A and B.

Note:

1. Scale is the same in both graphs for both axis.

2.X-axis is the independent variable and Y-axis is the dependent variable. 15 / 50

15. Choose the option which describes bias in the best manner.

16 / 50

16. What will happen when you apply a very large penalty?

17 / 50

17. What will happen when you apply a very large penalty in the case of Lasso?

18 / 50

18. Which of the following statement is true about outliers in Linear regression?

19 / 50

19. Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and you found that there is a relationship between them. Which of the following conclusion do you make about this situation?

20 / 50

20. What will happen when you fit degree 4 polynomial in linear regression?

21 / 50

21. What will happen when you fit degree 2 polynomial in linear regression?

22 / 50

22. In terms of bias and variance. Which of the following is true when you fit degree 2 polynomial?

23 / 50

23. Which of the following is true about the below graphs(A, B, C left to right) between the cost function and the number of iterations? Suppose l1, l2, and l3 are the three learning rates for A, B, C respectively. Which of the following is true about l1,l2, and l3?

24 / 50

24. Now we increase the training set size gradually. As the training set size increases, what do you expect will happen with the mean training error?

25 / 50

25. What do you expect will happen with bias and variance as you increase the size of training data?

26 / 50

26. Consider the following data where one input(X) and one output(Y) are given. What would be the root mean square training error for this data if you run a Linear Regression model of the form (Y = A0+A1X)?

27 / 50

Suppose you have been given the following scenario for training and validation error for Linear Regression. Which of the following scenario would give you the right hyperparameter?

28 / 50

28. Suppose you got the tuned hyperparameters from the previous question. Now, Imagine you want to add a variable in variable space such that this added feature is important. Which of the following thing would you observe in such a case?

29 / 50

29. In such a situation which of the following options would you consider?

2. Start introducing polynomial degree variables
3. Remove some variables

30 / 50

30. Now the situation is the same as written in the previous question(underfitting). Which of the following regularization algorithms would you prefer?

31 / 50

31. Multiple linear regression (MLR) is a __________ type of statistical analysis.

32 / 50

32. The following types of data can be used in MLR (choose all that apply)

33 / 50

33. A linear regression (LR) analysis produces the equation Y = 0.4X + 3. This indicates that:

34 / 50

34. A LR analysis produces the equation Y = -3.2X + 7. This indicates that:

35 / 50

35. The main purpose(s) of (LR) is/are (choose all that apply):

36 / 50

36. When writing regression formulae, which of the following refers to the predicted value on the dependent variable (DV)?

37 / 50

37. The major conceptual limitation of all regression techniques is that one can only ascertain relationships, but never be sure about underlying causal mechanism.

38 / 50

38. In MLR, the square of the multiple correlation coefficient or R2 is called the

39 / 50

39. In MLR, a residual is the difference between the predicted Y and actual Y values.

40 / 50

40. Shared and unique variance among multiple variables can be represented by a diagram that includes overlapping circles. This is referred to as a:

41 / 50

41. In an MLR, the r between the two IVs is 1. Therefore, R will equal the r between one of the IVs and the DV. (Hint: Draw a Venn Diagram.)

42 / 50

42. In a MLR, if the two IVs are correlated with the DV and the two IVs are correlated with one another, the rps (partial correlations) will be _______ in magnitude than the rs (Hint: Draw a Venn Diagram.)

43 / 50

43. In MLR, the unique variance in the DV explained by a particular IV is estimated by its:

44 / 50

44. Interaction effects can be tested in MLR by using IVs that represent:

45 / 50

45. A researcher wants to assess the extent to which social support from group members can explain changes in participants' mental health (MH) which is measured at the beginning and end of an intervention program. What MLR design could be used?

46 / 50

46. Imagine that each of the squares below represents a variable. The two diagrams represent models which both explain 75% of the variance in the outcome. Which looks to be the more parsimonious model? 47 / 50

47. A multiple linear regression with two explanatory variables is carried out, explaining 70% of the total variance in the outcome. Variable A uniquely accounts for 30% of the total variance and Variable B for 25% of the total variance. What accounts for the remaining 15% of variance which has been explained?

48 / 50

48.  Graphs A and B are P-P plots and can be used to ascertain whether or not the residuals for your regression analysis are normally distributed. Which of the two P-P plots might give you cause for concern?

49 / 50

49.  If a case has a high residual statistic, what does this suggest about the accuracy of the model for predicting that case’s score on the outcome measure?

50 / 50

50. Is it possible to talk sensibly about the main effect for a variable if that variable is involved in an interaction with another explanatory variable?

The average score is 21%

0% 1 / 36

1. Logistic Regression is a linear classifier

2 / 36

2. In R, what is the function used to create a Logistic Regression classifier?

3 / 36

3. Logistic Regression returns probabilities

4 / 36

4. In Python, what is the class used to create a logistic regression classifier?

5 / 36

5. In R, what value do we need to input for the family parameter?

6 / 36

6. True-False: Is Logistic regression a supervised machine learning algorithm?

7 / 36

7. True-False: Is Logistic regression mainly used for Regression?

8 / 36

8. True-False: Is it possible to design a logistic regression algorithm using a Neural Network Algorithm?

9 / 36

9.  True-False: Is it possible to apply a logistic regression algorithm on a 3-class Classification problem?

10 / 36

10. Which of the following methods do we use to best fit the data in Logistic Regression?

11 / 36

11. Which of the following evaluation metrics can not be applied in the case of logistic regression output to compare with the target?

12 / 36

12. One of the very good methods to analyze the performance of Logistic Regression is AIC, which is similar to R-Squared in Linear Regression. Which of the following is true about AIC?

13 / 36

13. [True-False] Standardisation of features is required before training a Logistic Regression.

14 / 36

14. Which of the following algorithms do we use for Variable Selection?

15 / 36

15. Consider a following model for logistic regression: P (y =1|x, w)= g(w0 + w1x)
where g(z) is the logistic function.

In the above equation the P (y =1|x; w) , viewed as a function of x, that we can get by changing the parameters w.

What would be the range of p in such case?

16 / 36

16. In above question what do you think which function would make p between (0,1)?

17 / 36

18. Suppose you train a logistic regression classifier and your hypothesis function H is

Which of the following figure will represent the decision boundary as given by above classifier? 18 / 36

19. Suppose you have been given a fair coin and you want to find out the odds of getting heads. Which of the following option is true for such a case?

19 / 36

20. The logit function(given as l(x)) is the log of odds function. What could be the range of logit function in the domain x=[0,1]?

20 / 36

21. Which of the following option is true?

21 / 36

22. Which of the following is true regarding the logistic function for any value “x”?

Note:
Logistic(x): is a logistic function of any number “x”

Logit(x): is a logit function of any number “x”

Logit_inv(x): is a inverse logit function of any number “x”

22 / 36

23. How will the bias change on using high(infinite) regularisation? 23 / 36

24. Suppose, You applied a Logistic Regression model on a given data and got a training accuracy X and testing accuracy Y. Now, you want to add a few new features in the same data. Select the option(s) which is/are correct in such a case.

Note: Consider remaining parameters are same.

24 / 36

25. Choose which of the following options is true regarding One-Vs-All method in Logistic Regression.

25 / 36

26. Below are two different logistic models with different values for β0 and β1. Which of the following statement(s) is true about β0 and β1 values of two logistics models (Green, Black)?

Note: consider Y = β0 + β1*X. Here, β0 is intercept and β1 is coefficient.

26 / 36

27. Below are the three scatter plot(A,B,C left to right) and hand drawn decision boundaries for logistic regression. Which of the following above figure shows that the decision boundary is overfitting the training data?

27 / 36

28. What do you conclude after seeing this visualization?

1. The training error in first plot is maximum as compare to second and third plot.

2. The best model for this regression problem is the last (third) plot because it has minimum training error (zero).

3. The second model is more robust than first and third because it will perform best on unseen data.

4. The third model is overfitting more as compare to first and second.

5. All will perform same because we have not seen the testing data.

28 / 36

29. Suppose, above decision boundaries were generated for the different value of regularization. Which of the above decision boundary shows the maximum regularization?

29 / 36

30. The below figure shows AUC-ROC curves for three logistic regression models. Different colors show curves for different hyper parameters values. Which of the following AUC-ROC will give best result? 30 / 36

31. What would do if you want to train logistic regression on same data that will take less time as well as give the comparatively similar accuracy(may not be same)?

Suppose you are using a Logistic Regression model on a huge dataset. One of the problem you may face on such huge data is that Logistic regression will take very long time to train.

31 / 36

32. Which of the following image is showing the cost function for y =1.

Following is the loss function in logistic regression(Y-axis loss function and x axis log probability) for two class classification problem.

Note: Y is the target class 32 / 36

33. Suppose, Following graph is a cost function for logistic regression. Now, How many local minimas are present in the graph?

33 / 36

34. Imagine, you have given the below graph of logistic regression  which is shows the relationships between cost function and number of iteration for 3 different learning rate values (different colors are showing different curves at different learning rates ). Suppose, you save the graph for future reference but you forgot to save the value of different learning rates for this graph. Now, you want to find out the relation between the leaning rate values of these curve. Which of the following will be the true relation?

Note:

1. The learning rate for blue is l1

2. The learning rate for red is l2

3. The learning rate for green is l3

34 / 36

35. Can a Logistic Regression classifier do a perfect classification on the below data? Note: You can use only X1 and X2 variables where X1 and X2 can take only two binary values(0,1).

35 / 36

36.  Suppose that you have trained a logistic regression classifier, and it outputs on a new example x a
prediction hθ(x) = 0.2. This means (check all that apply):

36 / 36

37. Which of the following statements are true? Check all that apply.

The average score is 15%

0%
/39 1 / 39

1. Decision trees are also known as CART. What is CART?

2 / 39

2. What are the advantages of Classification and Regression Trees (CART)?

3 / 39

3. What are the advantages of Classification and Regression Trees (CART)?

4 / 39

4. What are the disadvantages of Classification and Regression Trees (CART)?

5 / 39

5. Decision tree learners may create biased trees if some classes dominate. What’s the solution to it?

6 / 39

6. Decision tree can be used for ______.

7 / 39

7. Decision tree is a ______ algorithm.

8 / 39

8. Suppose, your target variable is whether a passenger will survive or not using the Decision Tree. What type of tree do you need to predict the target variable?

9 / 39

9. Suppose, your target variable is the price of a house using a Decision Tree. What type of tree do you need to predict the target variable?

10 / 39

10. What is the maximum depth in a decision tree?

11 / 39

11. Which of the following is/are true about bagging trees?

1. In bagging trees, individual trees are independent of each other

2. Bagging is the method for improving performance by aggregating the results of weak learners

12 / 39

12. Which of the following is/are true about boosting trees

1. In boosting trees, individual weak learners are independent of each other

2. It is the method for improving the performance by aggregating the results of weak learners

13 / 39

13. Which of the following is/are true about Random Forest and Gradient Boosting ensemble methods?

1. Both methods can be used for classification task

2. Random Forest is used for classification whereas Gradient Boosting uses for regression task

3. Random Forest is used for regression whereas Gradient Boosting uses for the Classification task

4. Both methods can be used for regression task

14 / 39

14. In a Random forest you can generate hundreds of trees (say T1, T2 …..Tn) and then aggregate the results of these trees. Which of the following is true about the individual(Tk) tree in Random Forest?

1. Individual tree is built on a subset of the features

2. Individual tree is built on all the features

3. Individual tree is built on a subset of observations

4. Individual tree is built on the full set of observations

15 / 39

15. Which of the following is true about the “max_depth” hyperparameter in Gradient Boosting?

1. Lower is the better parameter in case of the same validation accuracy
2. Higher is a better parameter in case of the same validation accuracy
3. Increase the value of max_depth may overfit the data
4. Increase the value of max_depth may underfit the data

16 / 39

16. Which of the following algorithm doesn’t use learning Rate as one of its hyperparameters?

2. Extra Trees 