1. What’s the risk with tuning hyperparameters using a test dataset?

2. what evaluation method should we use to evaluate model performance when 1) only small dataset is available? 2) big dataset is available? 3) limited computing resources?

3. explain why dropout in a neural network can help to address overfitting issue of deep neural networks?

4. what are the main technical breakthroughs that make the modern deep learning so powerful?

5. K-Nearest neighbor classifiers are widely used in real world practice. However, some preprocessing steps or filters are called in Weka are usually necessary to make it work. Describe two important Weka filters for KNN classifiers and explain why they are important

6. Support Vector(SVM) is one of the most popular classification algorithms.

a) Describe three key technical ideas of SVM classifiers

b) Describe the difference between linearly separately problems and non-linearly separable problems. Give a two-dimension example for both types

c) A SVM classifier is a linear discriminant classifier. However, it can be used to classifier linear non-separable problems. For the non-linearly separable example problem in b), use a trick of the SVM classifier to show that this problem can be solved using a linear classifier. Draw a figure to support your solution

d) why back-propagation neural network classifier has local optima problem while SVM does not?

7. For the 10 cases above, we are trying to fit a decision tree by splitting either on Color or Size, but only on one variable

a) compute the information gain in terms of entropy for these two splits (just show in fractions)

b) give the confusion matrix for the decision tree classifier that splits on Color

c) compute the Tree Positive Rate(TPR) of the decision tree classifier that splits on color

Label | Color | Size |

1 | Yellow | Large |

1 | Yellow | Large |

0 | Yellow | Small |

1 | Blue | Small |

0 | Blue | Large |

0 | Blue | Large |

0 | Blue | Small |

1 | Yellow | Small |

0 | Blue | Large |

1 | Blue | Large |

8.Deep learning

a)Suppose the input layer of a deep neural network has 5 channels with dimension of 10×10, and the next layer is a convolution layer with 10 filters of size 3×3,

– Calculate the number of parameters for this convolution layer

– Calculate the output feature map dimensions (without padding)

b)convolutional filters are usually used to detect local patterns in the input. But their receptive field is usually small e.g. 3×3. How does deep neural network learn to detect large objects in the input image?

9.