Post

einsum and AI

einsum and AI

In the last two posts, we explored the theoretical aspects of tensors and the Einstein summation notation. Now we will use einsum library. Numpy, Pytorch, and TensorFlow all have einsum functionality. In this tutorial, we will use Pytorch.

einsum uses Einstein summation to do linear algebraic operations.

1
import torch
1
2
3
4
# let us create tensor of rank 2
a = torch.arange(8).reshape(2,4)
print(a.shape)
a
1
2
3
4
torch.Size([2, 4])

tensor([[0, 1, 2, 3],
        [4, 5, 6, 7]])

Before we proceed further, let us understand how does it works in pytorch.

torch.einsum() takes two arguments- equation and operands

in $A_{i,j}$ A is the operand, and with indices we can create an equation based on the operation we want, it should be in einstein summation notation.

Matrix Transpose


$B_{j,i}=A^{T}_{i,j}$

Here the equation would be ij->ji

1
2
3
b = torch.einsum('ij->ji', a)
print(b.shape)
b
1
2
3
4
5
6
7
torch.Size([4, 2])


tensor([[0, 4],
        [1, 5],
        [2, 6],
        [3, 7]])

Inner Product


$C_{i,k}=A_{i,j}B_{j,k}$ since it is einstein notation, repeated indices are summed over.

1
a
1
2
tensor([[0, 1, 2, 3],
        [4, 5, 6, 7]])
1
b
1
2
3
4
tensor([[0, 4],
        [1, 5],
        [2, 6],
        [3, 7]])
1
2
3
c = torch.einsum('ij,jk->ik', a,b)
print(c.shape)
c
1
2
3
4
torch.Size([2, 2])

tensor([[ 14,  38],
        [ 38, 126]])

Outer Product


$A_i\otimes B_j=C_{i,j}$

1
2
a = torch.arange(4).reshape(4,)
b = torch.arange(2).reshape(2,)
1
2
print(f'a={a},b={b}')
print(f'shape of a={a.shape},shape of b={b.shape}')
1
2
a=tensor([0, 1, 2, 3]),b=tensor([0, 1])
shape of a=torch.Size([4]),shape of b=torch.Size([2])
1
c = torch.einsum('i,j->ij', a,b)
1
2
print(f'c={c}')
print(f'shape of c={c.shape}')
1
2
3
4
5
c=tensor([[0, 0],
        [0, 1],
        [0, 2],
        [0, 3]])
shape of c=torch.Size([4, 2])

These are some basic operations. we can also do the hadamard operation, contraction etc also

Neural Network with einsum


Hidden layer:

\[h_{j,k} = x_{j,i} \cdot W^{1}_{i,k} + b^{1}_{k}\]

Output:

\[o_{j,n} = h_{j,k} \cdot W^{2}_{k,n} + b^{2}_{n}\]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import torch
# Create a network class
class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.W1 = torch.randn(input_size, hidden_size, requires_grad=True)
        self.b1 = torch.randn(hidden_size, requires_grad=True)
        self.W2 = torch.randn(hidden_size, output_size, requires_grad=True)
        self.b2 = torch.randn(output_size, requires_grad=True)

    def forward(self, x):
        hidden = torch.einsum("ji,ik->jk", x, self.W1) + self.b1 # use of einsum
        hidden = torch.relu(hidden) # Activation function
        output = torch.einsum("jk,kn->jn", hidden, self.W2) + self.b2 # use of einsum
        return output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Hyperparameters
input_size = 10
hidden_size = 5
output_size = 2
learning_rate = 0.01
num_epochs = 200

# Create the network
model = NeuralNetwork(input_size, hidden_size, output_size)

# Let's create some dummy data
X = torch.randn(100, input_size)
y = torch.randint(0, output_size, (100,))  # Example: classification task

# Optimizer (Stochastic Gradient Descent)
optimizer = torch.optim.SGD([model.W1, model.b1, model.W2, model.b2], lr=learning_rate)

# Training loop
for epoch in range(num_epochs):
    # Forward pass
    outputs = model.forward(X)

    # Calculate loss (using cross-entropy for classification)
    loss = F.cross_entropy(outputs, y)

    # Backward pass
    optimizer.zero_grad()  # Reset gradients
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Epoch [10/200], Loss: 2.5896
Epoch [20/200], Loss: 2.1862
Epoch [30/200], Loss: 1.8618
Epoch [40/200], Loss: 1.6038
Epoch [50/200], Loss: 1.4057
Epoch [60/200], Loss: 1.2593
Epoch [70/200], Loss: 1.1501
Epoch [80/200], Loss: 1.0642
Epoch [90/200], Loss: 0.9937
Epoch [100/200], Loss: 0.9344
Epoch [110/200], Loss: 0.8843
Epoch [120/200], Loss: 0.8422
Epoch [130/200], Loss: 0.8071
Epoch [140/200], Loss: 0.7782
Epoch [150/200], Loss: 0.7547
Epoch [160/200], Loss: 0.7359
Epoch [170/200], Loss: 0.7208
Epoch [180/200], Loss: 0.7089
Epoch [190/200], Loss: 0.6995
Epoch [200/200], Loss: 0.6921
This post is licensed under CC BY 4.0 by the author.