dropout

Dropout is a regularization technique to prevent overfitting. According to wikipedia, regularization is a general method to convert answer to a problem to a simpler one. Dropout works by randomly change activations to zero at training time. Intuitively, this technique makes sure all neurons actively work toward the output.

A problem of dropout is the total sum of activations become less after dropout. Therefore, if we apply a dropout rate of $p$ to all activations, we rescale the rest of activations by dividing them by $1 - p$ so that their sum are the same as before. For example, say the sum of activations is $s$ , and we apply a dropout rate of $p$ , This won’t change the total sum:

\frac{s - s p}{1 - p} = s

In the following example, we have a minimum example of applying dropout with a probability of 50% to a 2d tensor. We can see that some of the elements are turned into 0 and the remaining terms are doubled.

import torch
import torch.nn as nn
 
x = torch.tensor([[2.0, 4.0, 6.0, 8.0]])
 
# Step 2: Define dropout with p=0.5 (50% chance of zeroing each value)
dropout = nn.Dropout(p=0.5)
 
for i in range(5):
    dropout.train()  # make sure dropout is "on"
    y = dropout(x)
 
    print("Input: ", x)
    print("After dropout: ", y)

Input:  tensor([[2., 4., 6., 8.]])
After dropout:  tensor([[ 0.,  8., 12.,  0.]])
Input:  tensor([[2., 4., 6., 8.]])
After dropout:  tensor([[ 4.,  8., 12., 16.]])
Input:  tensor([[2., 4., 6., 8.]])
After dropout:  tensor([[ 4.,  0., 12., 16.]])
Input:  tensor([[2., 4., 6., 8.]])
After dropout:  tensor([[ 0.,  8., 12.,  0.]])
Input:  tensor([[2., 4., 6., 8.]])
After dropout:  tensor([[ 0.,  0.,  0., 16.]])

Quarry

All Entries

Recent entries

policy gradient theorem

neural network

Adam

dropout