Broadcasting is a mechanism in numpy and pytorch that make arithmetic operation on two arrays (including high dimensional) of different shapes possible. By default, numpy operations are done on an element-by-element basis, broadcasting can be seen as a way to turn a smaller array to a bigger one before performing element-wise operations. Under the hood, broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. The basic rules of broadcasting is the following: two dimensions are compatible when
- they are equal, or
- one of them is 1 (we are not referring to the # of dimension, but the size of each dimension)
The resulting dimensions are calculated as the following:
- If the number of dimensions are not equal, prepend 1 to the dimensions of the array with fewer dimensions to make them equal length
- Then, for each dimension size, the resulting dimension size is the max of the sizes of the two along that dimension
Before we diving into broadcasting, we address a common misunderstanding of numpy array and python array. The following array dimensions conform to the rules above since ar2 has dimension 1, but the array type is python list, not a numpy array, therefore broadcasting does not apply here. Instead, what we got is list repetition.
ar1 = [1, 3, 4] # Python list, not a NumPy array
ar2 = 2
ar1 * ar2
[1, 3, 4, 1, 3, 4]
Numpy Array
Now we look at a basic example of array broadcasting in numpy (same for pytorch). Here the ar2
has dimension 1, in multiplication, it is stretched to match the size of ar1
and an element-wise multiplication is performed.
import numpy as np
ar1 = np.array([1, 3, 4])
ar2 = 2
ar1 * ar2
array([2, 6, 8])
Note that this is also different from dot product, although in this case the result will be the same.
np.dot(ar2, ar1)
array([2, 6, 8])
This is apparent when we perform dot product and element-wise multiplication on two arrays of the same shape:
ar3 = [1, 3, 3]
np.dot(ar1, ar3), ar1 * ar3
(np.int64(22), array([ 1, 9, 12]))
A simple way to understand the how broadcasting works is two align the dimensions of the two arrays:
A (4d array): 8 x 1 x 6 x 1
B (3d array): 7 x 1 x 5
Result (4d array): 8 x 7 x 6 x 5
Here we can see that we are performing operation on a 4d and a 3d array. We align the dimensions, if the dimension size is 1, we will stretch that dimension to match the larger size. Notice that all we care about is the size of a particular dimension, not the # of dimensions. Also, broadcasting is done the same way regardless of what operation we perform on them.
ar4 = np.random.rand(8, 1, 6, 1)
ar5 = np.random.rand(7, 1, 5)
ar4.shape, ar5.shape, (ar4 + ar5).shape, (ar4 * ar5).shape
((8, 1, 6, 1), (7, 1, 5), (8, 7, 6, 5), (8, 7, 6, 5))
Next, we look at examples where broadcasting doesn’t apply.
l1 = np.array([1, 2, 3])
l2 = np.array([1, 3])
print(l1.shape, l2.shape)
l1 * l2
(3,) (2,)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[54], line 4
2 l2 = np.array([1, 3])
3 print(l1.shape, l2.shape)
----> 4 l1 * l2
ValueError: operands could not be broadcast together with shapes (3,) (2,)
This operation fails because the size of the dimensions of the two arrays don’t align. Although both are 1-d array (# of dimension match), but we have:
A (1d array): 3
B (1d array): 2
For dimension 1, we have a 3 and a 2, this doesn’t work since one of them needs to be 1, or both the same, according to the rule above.
l3 = np.array([[1, 3, 3], [2, 3, 3], [1, 3, 3], [2, 3, 3]])
l2 * l3
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[64], line 2
1 l3 = np.array([[1, 3, 3], [2, 3, 3], [1, 3, 3], [2, 3, 3]])
----> 2 l2 * l3
ValueError: operands could not be broadcast together with shapes (2,) (4,3)
Similarly, the above example fails because dimension 1 sizes doesn’t align, the error message also made it clear.
A (1d array): 2
B (2d array): 4 x 3
PyTorch Tensor
Broadcasting works the same way in pytorch as numpy.
import torch
x = torch.empty(1, 3)
y = torch.empty(2, 5, 1)
(x + y).size()
torch.Size([2, 5, 3])
Tip
As a general rule, when defining an array/tensor with mixed dimensions, align them from the right to make the reading easier, as shown below.
x = torch.empty( 1, 3)
y = torch.empty(2, 5, 1)
(x + y).size()
torch.Size([2, 5, 3])