Lecture 'Deep Learning'
Organization
All dates for winter term 18/19
Slides (password protected)
Tablet writings
1. Ein erstes einfaches Neuronales Netz (SOM)
2. Das Perzeptron Neuron und Netzmodell
3. Trainieren von Machine Learning Modellen
4. Automatisches Differenzieren
5. Ein konkretes Beispiel zum Reverse Mode Autodiff
SchrittfürSchritt Einführung in TensorFlow
6. Bedeutung der Transferfunktion
7. Long Short Term Memory (LSTM)
8. Optimierer: Momentum, Nesterov, AdaGrad, RMSProp, Adam (Tafelanschrieb)
9. LSTM Training
Exercises
01  Introduction to Python (for 18.10.18)
02  Introduction to NumPy (for 25.10.18)
03  Automatic Differentiation (for 08.11.18)
Solution for exercise 03
04  Applications of Deep Learning
05  MLP for regression: Pandas (part 1/4)
06  MLP for regression: MLP in TensorFlow (part 2/4)
07  MLP for regression: MLP in Keras, NaN Processing, Scaling (part 3/4)
08  MLP for regression: Working with categorial features (part 4/4)
09  CNN for image classification in Keras
10  LSTM for time series prediction
11  LSTM for Natural Language Processing (NLP)
Data
 Convolutions: video with test pattern

10x10 audio dataset contains:
 training data: 10 audio streams of a single word spoken 10 times
 test data: 10 audio streams of a single word spoken once
Videos
 Video of original experiments of Hubel and Wiesel from 1959 showing the existence of "simple" and "complex" cells.
 Video of a Convolutional Neural Network demo from 1993. Yes, the CNN model is not really new!
 Minibatch gradient descent by Andrew Ng. Explains what minibatches are compared to batches (11m28s)
 Understanding MiniBatch Gradient Descent by Andrew Ng. Explains how the path to a local minimum looks like for Batch GD, SGD and Minibatch GD (11m18s)
 Gradient descent with momentum by Andrew Ng. Gives a good intuition in a short time (9m20s)
 RMSProp by Andrew Ng. Again gives a good intuition in a short time (7m41s)
 Adam Optimization Algorithm by Andrew Ng. Mainly presents the formulas and shows that Adam (="Adaptive Moment Estimation") is a combination of the momentum optimizer with the RMSProp optimizer (7m07s)
 Exponentially Weighted Averages by Andrew Ng. RMSProp+Adam use exponentially weighted (moving) averages (EWMA) of squared gradients. For this, Andrew Ng introduces EWMA as well (5m58s)
 Bias Correction of Exponentially Weighted Averages by Andrew Ng. EWMA are actually quite bad estimates at the initial phase. They tend to underestimate the average. Here Andrew Ng explains how to correct for this artefact by multiplying the EWMA with (1/(1beta^t)) (4m11s)
 Learning Rate Decay by Andrew Ng. Shows the motivation why to decrease the learning rate and shows some formulas that are typically used to decrease the learning rate as a function of the number of training epochs (6m44s)
 Tuning Process by Andrew Ng. Tells us not to use a grid sampling strategy, but random sampling and to follow a coarsetofine search (7m10s)
 Normalizing Activations in a Network by Andrew Ng. Explains the key idea of activation normalization: Introducing a normalization step for the activation or output values of neurons in a layer, such that they have a certain mean value and a certain variance. Thus two new hyper parameters are introduced for each layer, which allows to learn a good mean value and variance for the activations / output values of neurons in the layer (8m54s)
 Fitting Batch Norm Into Neural Networks by Andrew Ng. Explains that activations are normally computed on basis of the resulting perlayer activations given by a batch of samples and shows that the usual bias vectors are not needed any longer, since they are already incorporated in the batch normalization step, where we learn the best mean for the activation of each neuron in the current layer being considered (12m55s)
 Why Does Batch Norm Work? by Andrew Ng. Explains why batch normalization helps: 1. it gives later layers a more stable input, 2. it acts a regularizer similary to dropout, since it adds noise to the activation values of the neurons, since the normalization parameters are computed iteratively using all minibatches, but are then applied to only the current minibatch (11m39s)
 Batch Norm At Test Time by Andrew Ng. Explains an important difference between training and inference step. During training the normalization of the neuron's activation values is computed on basis of a minibatch. And what do we do at test time? Here Andrew Ng explains that usually we keep track of an (exponenetially weigted) moving average of the means and variances computed during training and use these estimates for the normalization step at test (inference) time (5m46s)
Links
 How to create a deep learning dataset using Google Images
 A nice visualization of a CNN. by Adam Harley
 Deep Learning Glossary  A very compact intro/overview into Deep Learning related terms by Denny Britz
 AI and Deep Learning in 2017 – A Year in Review by Denny Britz
 An overview of gradient descent optimization algorithms by Sebastian Ruder
This is old material from my Deep Learning lecture hold in winter term 16/17:
Slides: Slides of all lectures till January, 19 2017
Exercises:
 Exercise 01: building the OpenCV library and experimenting with convolutions
 Exercise 02: Reading in the MNIST dataset and filtering sample images with a filter bank
 Exercise 03: Perceptron classifier
 Exercise 04: Multi Layer Perceptron feedforward step and performance tests
 Exercise 05: Backpropagation and implementation testing
 Exercise 06: MLP network topologies and transfer functions
 Exercise 07: MLP with TensorFlow
 Exercise 08: CNN with TensorFlow
 Exercise 09: AlexNet CNN with TensorFlow
 Exercise 10: Using a pretrained CNN model
 Exercise 11: Unsupervised learning of features
 Exercise 12: Long Short Term Memory (LSTM)
 Exercise 13: Hierarchical unsupervised learning of features
Sample solutions for all exercises can be found at github.