Capsule Networks

Pranav Srivastava
5 min readFeb 9, 2021

--

Capsule Networks, also known as CapsNets, are proposed as an alternative to Convolutional Neural Networks (CNN) by Geoffrey Hinton and his colleagues to achieve translational invariance in a better way. CapsNets were first introduced in 2011 and in November 2017 it achieved the state-of-the-art performance on MNIST (famous dataset containing handwritten digits) dataset.

  • What are current challenges with CNNs?

1) MaxPooling is a pooling layer used in CNN for dimensionality reduction, in the process of routing the data over the layers. A lot of information is lost in the pooling as the spatial resolution is reduced. Pooling picks the neuron with the best activation and not the relevant to the current task. Therefore, pooling layers results in the network to become invariant (unchanged) to small changes in the input data. This could become a problem in cases like semantic segmentation or object detection.

2) It is also said that as MaxPooling layer loses a lot of data, therefore training a CNN requires even more data and more training time.

3) CNNs don’t understand pose very well (object translation and rotation). However, techniques like data augmentation can handle this and are quite effective.

  • Overview

CapsNet is a neural network that tries to perform inverse graphics. Inverse graphics or inverse rendering is basically a process where an image is taken and it’s abstract information (instantiation parameters like coordinates and angle) is found. CapsNet are composed of capsules instead of neurons. A capsule is a group of neurons which encode the spatial information as well as the probability of the object being present. Probability of an object in the image and the pose information (position, rotation, skew, thickness) at a particular location is represented by the length and direction of capsule vector (activation vector) respectively. Whenever the position of an object is changed then capsule vector also changes and this is called equivariance. As capsule vectors are equivariant therefore they are very promising.

  • Layers in CapsNet

Capsules in the CapsNet is organized in multiple layers. Capsules in the lowest layer is called primary capsule and in high layers are called routing capsules.

Primary Capsule layer: detects the presence and pose of pattern

Routing Capsule Layer: detects larger and complex objects, like boats in the diagram below

Fig. Part C(a) layers in the capsule network [source: 2] with the example.

Rectangle and triangle capsules are the primary capsule layer.

Boat and house capsules are in routing capsule layer.

  • Implementation

Primary layer in capsule networks is implemented using multiple convolutional layers. The scalers (feature map) obtained from the convolutional layers are converted into vector by transformation. Further, squashing function is applied to these vectors so that their lengths are contained between 0 and 1 (in order to represent the probability). The final output from the primary capsule layer is passed on to routing capsule layer.

Routing capsule layer is also implemented using multiple convolutional layers, but it uses an algorithm called routing-by-agreement algorithm. This algorithm is actually a core of CapsNet. Based on the algorithm, capsules representing capsules (like triangle or rectangle) will agree or disagree on the pose of the object (like a house or a boat) during the iterative training. In other words, if triangle capsules and rectangle capsules agree on the pose of object to be a boat then more outputs will be sent to the boat capsule than the house capsule.

Routing weight is maintained by routing-by-agreement algorithm and for each agreement it is increased and for the disagreement it is decreased. In other words, routing-by-agreement algorithm involves iterations containing agreement-detection and routing-weight update.

  • Squash and routing-by-agreement in detail

Squash function is an activation function which makes sure that the lengths of the vectors stay in the range of 0 and 1 so that it can represent the probability. If a vector is squashed to 0 then it has a vanishing gradient, meaning it will prevent the weight to change it’s value.

Fig. Part C(b) [source: 4]

The below figure shows the representation of the capsule where input scalers are first converted to scalers and then squashed.

Fig. Part C(c) [source: 1]

Routing by agreement collects the clusters of agreements from all the primary capsules and calculates the mean (see line 5 in the procedure below). Higher value of probability means agreement and routing weights to a capsule is updated by adding the weights (see line 7 in the procedure below). Lower value of the probability means dis-agreement and routing weights to a capsule is updated by decreasing the weights. This goes in the iterative manner. The overall procedure for dynamic routing in Fig. Part C(d) where softmax function is used to normalize the routing coefficients between capsules of adjacent layers.

Fig. Part C(d) dynamic routing [source: 1]
  • Conclusion

CapsNet seem to overcome the challenges in CNNs and seem to be very promising. CapsNet uses routing-by-agreement agreement which is a replacement for the routing implemented by max-pooling, which allows neurons in one layer to ignore all but the most active feature detector in a local pool in the layer below [3]. CapsNet requires a less amount for training and hence saves a lot of time as well. As they are capable of extracting pose information hence they can be also very effective in the applications involving 3D image reconstruction.

References:

[1] https://www.sciencedirect.com/science/article/pii/S1319157819309322

[2] https://www.oreilly.com/content/introducing-capsule-networks/

[3] https://papers.nips.cc/paper/6975-dynamic-routing-between-capsules.pdf

[4] https://en.wikipedia.org/wiki/Capsule_neural_network

--

--

No responses yet