Broadcasting is a mechanism in numpy and pytorch that make arithmetic operation on two arrays (including high dimensional) of different shapes possible. By default, numpy operations are done on an element-by-element basis, broadcasting can be seen as a way to turn a smaller array to a bigger one before performing element-wise operations. Under the hood, broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. The basic rules of broadcasting is the following: two dimensions are compatible when

  1. they are equal, or
  2. one of them is 1 (we are not referring to the # of dimension, but the size of each dimension)

The resulting dimensions are calculated as the following:

  1. If the number of dimensions are not equal, prepend 1 to the dimensions of the array with fewer dimensions to make them equal length
  2. Then, for each dimension size, the resulting dimension size is the max of the sizes of the two along that dimension

Before we diving into broadcasting, we address a common misunderstanding of numpy array and python array. The following array dimensions conform to the rules above since ar2 has dimension 1, but the array type is python list, not a numpy array, therefore broadcasting does not apply here. Instead, what we got is list repetition.

ar1 = [1, 3, 4]  # Python list, not a NumPy array
ar2 = 2
ar1 * ar2
[1, 3, 4, 1, 3, 4]

Numpy Array

Now we look at a basic example of array broadcasting in numpy (same for pytorch). Here the ar2 has dimension 1, in multiplication, it is stretched to match the size of ar1 and an element-wise multiplication is performed.

import numpy as np
 
ar1 = np.array([1, 3, 4])
ar2 = 2
ar1 * ar2
array([2, 6, 8])

Note that this is also different from dot product, although in this case the result will be the same.

np.dot(ar2, ar1)
array([2, 6, 8])

This is apparent when we perform dot product and element-wise multiplication on two arrays of the same shape:

ar3 = [1, 3, 3]
np.dot(ar1, ar3), ar1 * ar3
(np.int64(22), array([ 1,  9, 12]))

A simple way to understand the how broadcasting works is two align the dimensions of the two arrays:

A      (4d array):  8 x 1 x 6 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  8 x 7 x 6 x 5

Here we can see that we are performing operation on a 4d and a 3d array. We align the dimensions, if the dimension size is 1, we will stretch that dimension to match the larger size. Notice that all we care about is the size of a particular dimension, not the # of dimensions. Also, broadcasting is done the same way regardless of what operation we perform on them.

ar4 = np.random.rand(8, 1, 6, 1)
ar5 = np.random.rand(7, 1, 5)
ar4.shape, ar5.shape, (ar4 + ar5).shape, (ar4 * ar5).shape
 
((8, 1, 6, 1), (7, 1, 5), (8, 7, 6, 5), (8, 7, 6, 5))

Next, we look at examples where broadcasting doesn’t apply.

l1 = np.array([1, 2, 3])
l2 = np.array([1, 3])
print(l1.shape, l2.shape)
l1 * l2
(3,) (2,)



---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[54], line 4
      2 l2 = np.array([1, 3])
      3 print(l1.shape, l2.shape)
----> 4 l1 * l2


ValueError: operands could not be broadcast together with shapes (3,) (2,) 

This operation fails because the size of the dimensions of the two arrays don’t align. Although both are 1-d array (# of dimension match), but we have:

A      (1d array):  3
B      (1d array):  2

For dimension 1, we have a 3 and a 2, this doesn’t work since one of them needs to be 1, or both the same, according to the rule above.

l3 = np.array([[1, 3, 3], [2, 3, 3], [1, 3, 3], [2, 3, 3]])
l2 * l3
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[64], line 2
      1 l3 = np.array([[1, 3, 3], [2, 3, 3], [1, 3, 3], [2, 3, 3]])
----> 2 l2 * l3


ValueError: operands could not be broadcast together with shapes (2,) (4,3) 

Similarly, the above example fails because dimension 1 sizes doesn’t align, the error message also made it clear.

A      (1d array):     2
B      (2d array): 4 x 3

PyTorch Tensor

Broadcasting works the same way in pytorch as numpy.

import torch
 
x = torch.empty(1, 3)
y = torch.empty(2, 5, 1)
(x + y).size()
torch.Size([2, 5, 3])

Tip

As a general rule, when defining an array/tensor with mixed dimensions, align them from the right to make the reading easier, as shown below.

x = torch.empty(   1, 3)
y = torch.empty(2, 5, 1)
(x + y).size()
 
torch.Size([2, 5, 3])