In the last two posts, we explored the theoretical aspects of tensors and the Einstein summation notation. Now we will use einsum library. Numpy, Pytorch, and TensorFlow all have einsum functionality. In this tutorial, we will use Pytorch.
einsum uses Einstein summation to do linear algebraic operations.
1
2
3
4
| # let us create tensor of rank 2
a = torch.arange(8).reshape(2,4)
print(a.shape)
a
|
1
2
3
4
| torch.Size([2, 4])
tensor([[0, 1, 2, 3],
[4, 5, 6, 7]])
|
Before we proceed further, let us understand how does it works in pytorch.
torch.einsum() takes two arguments- equation and operands
in $A_{i,j}$ A is the operand, and with indices we can create an equation based on the operation we want, it should be in einstein summation notation.
Matrix Transpose
$B_{j,i}=A^{T}_{i,j}$
Here the equation would be ij->ji
1
2
3
| b = torch.einsum('ij->ji', a)
print(b.shape)
b
|
1
2
3
4
5
6
7
| torch.Size([4, 2])
tensor([[0, 4],
[1, 5],
[2, 6],
[3, 7]])
|
Inner Product
$C_{i,k}=A_{i,j}B_{j,k}$ since it is einstein notation, repeated indices are summed over.
1
2
| tensor([[0, 1, 2, 3],
[4, 5, 6, 7]])
|
1
2
3
4
| tensor([[0, 4],
[1, 5],
[2, 6],
[3, 7]])
|
1
2
3
| c = torch.einsum('ij,jk->ik', a,b)
print(c.shape)
c
|
1
2
3
4
| torch.Size([2, 2])
tensor([[ 14, 38],
[ 38, 126]])
|
Outer Product
$A_i\otimes B_j=C_{i,j}$
1
2
| a = torch.arange(4).reshape(4,)
b = torch.arange(2).reshape(2,)
|
1
2
| print(f'a={a},b={b}')
print(f'shape of a={a.shape},shape of b={b.shape}')
|
1
2
| a=tensor([0, 1, 2, 3]),b=tensor([0, 1])
shape of a=torch.Size([4]),shape of b=torch.Size([2])
|
1
| c = torch.einsum('i,j->ij', a,b)
|
1
2
| print(f'c={c}')
print(f'shape of c={c.shape}')
|
1
2
3
4
5
| c=tensor([[0, 0],
[0, 1],
[0, 2],
[0, 3]])
shape of c=torch.Size([4, 2])
|
These are some basic operations. we can also do the hadamard operation, contraction etc also
Neural Network with einsum
Hidden layer:
\[h_{j,k} = x_{j,i} \cdot W^{1}_{i,k} + b^{1}_{k}\]
Output:
\[o_{j,n} = h_{j,k} \cdot W^{2}_{k,n} + b^{2}_{n}\]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| import torch
# Create a network class
class NeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
self.W1 = torch.randn(input_size, hidden_size, requires_grad=True)
self.b1 = torch.randn(hidden_size, requires_grad=True)
self.W2 = torch.randn(hidden_size, output_size, requires_grad=True)
self.b2 = torch.randn(output_size, requires_grad=True)
def forward(self, x):
hidden = torch.einsum("ji,ik->jk", x, self.W1) + self.b1 # use of einsum
hidden = torch.relu(hidden) # Activation function
output = torch.einsum("jk,kn->jn", hidden, self.W2) + self.b2 # use of einsum
return output
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| # Hyperparameters
input_size = 10
hidden_size = 5
output_size = 2
learning_rate = 0.01
num_epochs = 200
# Create the network
model = NeuralNetwork(input_size, hidden_size, output_size)
# Let's create some dummy data
X = torch.randn(100, input_size)
y = torch.randint(0, output_size, (100,)) # Example: classification task
# Optimizer (Stochastic Gradient Descent)
optimizer = torch.optim.SGD([model.W1, model.b1, model.W2, model.b2], lr=learning_rate)
# Training loop
for epoch in range(num_epochs):
# Forward pass
outputs = model.forward(X)
# Calculate loss (using cross-entropy for classification)
loss = F.cross_entropy(outputs, y)
# Backward pass
optimizer.zero_grad() # Reset gradients
loss.backward()
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| Epoch [10/200], Loss: 2.5896
Epoch [20/200], Loss: 2.1862
Epoch [30/200], Loss: 1.8618
Epoch [40/200], Loss: 1.6038
Epoch [50/200], Loss: 1.4057
Epoch [60/200], Loss: 1.2593
Epoch [70/200], Loss: 1.1501
Epoch [80/200], Loss: 1.0642
Epoch [90/200], Loss: 0.9937
Epoch [100/200], Loss: 0.9344
Epoch [110/200], Loss: 0.8843
Epoch [120/200], Loss: 0.8422
Epoch [130/200], Loss: 0.8071
Epoch [140/200], Loss: 0.7782
Epoch [150/200], Loss: 0.7547
Epoch [160/200], Loss: 0.7359
Epoch [170/200], Loss: 0.7208
Epoch [180/200], Loss: 0.7089
Epoch [190/200], Loss: 0.6995
Epoch [200/200], Loss: 0.6921
|