Dropout is a regularization technique to prevent overfitting. According to wikipedia, regularization is a general method to convert answer to a problem to a simpler one. Dropout works by randomly change activations to zero at training time. Intuitively, this technique makes sure all neurons actively work toward the output.

A problem of dropout is the total sum of activations become less after dropout. Therefore, if we apply a dropout rate of to all activations, we rescale the rest of activations by dividing them by so that their sum are the same as before. For example, say the sum of activations is , and we apply a dropout rate of , This won’t change the total sum:

In the following example, we have a minimum example of applying dropout with a probability of 50% to a 2d tensor. We can see that some of the elements are turned into 0 and the remaining terms are doubled.

import torch
import torch.nn as nn
 
x = torch.tensor([[2.0, 4.0, 6.0, 8.0]])
 
# Step 2: Define dropout with p=0.5 (50% chance of zeroing each value)
dropout = nn.Dropout(p=0.5)
 
for i in range(5):
    dropout.train()  # make sure dropout is "on"
    y = dropout(x)
 
    print("Input: ", x)
    print("After dropout: ", y)
Input:  tensor([[2., 4., 6., 8.]])
After dropout:  tensor([[ 0.,  8., 12.,  0.]])
Input:  tensor([[2., 4., 6., 8.]])
After dropout:  tensor([[ 4.,  8., 12., 16.]])
Input:  tensor([[2., 4., 6., 8.]])
After dropout:  tensor([[ 4.,  0., 12., 16.]])
Input:  tensor([[2., 4., 6., 8.]])
After dropout:  tensor([[ 0.,  8., 12.,  0.]])
Input:  tensor([[2., 4., 6., 8.]])
After dropout:  tensor([[ 0.,  0.,  0., 16.]])