Reconciling Data Shapes and Parameter Counts in Keras

Convolutional layers and their cousins the pooling layers are examined for shape modification and parameter counts as functions of layer parameters in Keras/Tensorflow…

Keras is synonymous with deep learning. Building multi-input, multi-output network of connected layers is a routine task with Keras. The data/tensors (multidimensional matrix of numbers) twist & turn their way from layer to layer, from entry to exit, as the network racks up millions of parameters along the way. How the data shape morphs as it squeezes through a layer, and how many parameters that layer adds to the network model are of course a function of the layer in question and how that layer has been instantiated in Keras. In the earlier article Flowing Tensors and Heaping Parameters in Deep Learning:

  • We looked at the Dense, Embedding, and Recurrent (LSTM/GRU) layers in some detail to understand how and why the data shape changed as it passed through these layers.
  • Using the equations describing the data modification within the layer, we derived formulae for the number of trainable parameters used in each of these layers for the said modification.

We continue the exercise here with Convolutional and Pooling layers heavily used in image classification. We close the series by running the Visual question answering model and confirm that our formulae/analysis are correct for the trainable parameters and output shapes. The full code for the snippets can be obtained from github.

1. Convolutional Layer

Convolutional layers are basically feature extractors. They are mostly used with images, but can be applied to text as well for pattern/feature identification and classification thereof. When working with text, we first turn words into numerical vectors using an Embedding layer or using externally supplied vectors.

1.1 Input Shape

Input to a convolutional layer can be batches of images or sentences. Keras default for input data is “channels_last” meaning the number of channels/features N_c would be the last dimension, and as usual the first dimension is the batch_size left out here as ‘None’. In between these two are the dimensions of the image (or the sequence length in case of text).

[batch_size, {dimensions of image/text}, Number of channels/features]
Figure 1. A sentence can be looked upon as a 1-d image with as many pixels as the number of words, with each pixel/word having as many channels as the length of the word-vector

1.2 Output Shape

The convolution operation is well explained with pretty pictures and such in a number of articles. Our interest here is in shape transformation and parameter counts. Figure 2 and Table 1 below summarize the discussion that follows for shape transformations.

Figure 2. A convolution layer with N_f filters transforms an [I_x, I_y, N_c] image to [O_x, O_y, N_f] image. O_x, O_y and the number of trainable parameters are given by the indicated formulae.
  1. The output of convolutional layer is produced by filters. Each filter has a size like [f_x, f_y] for example in 2-d. Each filter is automatically N_c deep where N_c is the number of channels in the input. That is, its actual shape is [f_x, f_y, N_c]
  2. Each filter generates a new channel for the output. That is, convolving an image with however many channels with 64 filters yields an output image with 64 channels.
  3. The size [O_x, O_y] of the output image above in 2 depends on two settings in Keras.
    • strides s_x, s_y: How the filter moves along the input matrix of numbers/pixels for its element-wise product and summation operation. s=1 means that the filter moves one pixel/cell at a time
    • padding: When the filter dimensions (f_x and f_y) and its strides (s_x and s_y) across the image (I_x and I_y) are not exactly right, a part of the input data may not get processed by the convolutional layer. This is what happens by default (padding=valid) in Keras. When padding is set to same, the input image/matrix is padded around with fake data like 0, just so all the real data does get processed by the filter. Table 1 below summarizes the shape and size of the output image upon convolution in Keras with Tensorflow.
Table 1. Output image shape upon convolution in Keras (version 2.2.4) with Tensorflow backend (version 1.13.1)

1.3 Parameter Counts

Besides adding to the number of channels in the output image, each filter brings a bunch of parameters to the table. We have seen that a filter has the shape [f_x, f_y, N_c] in 2-d (or [f_x, N_c] in 1-d image/text) where N_c is the number of channels in the input. The key properties when it comes to filter parameters are as follows.

  1. Even as a filter strides and slides convolving across the input image/matrix, it uses the same parameters.
  2. There is one weight associated with each filter cell and channel. In other words a filter has f_x * f_y * N_c weights.
  3. The entire filter has one bias parameter.

So if a convolutional layer employs N_f filters each of size [f_x, f_y] convolving over an input image with N_c channels the following equation gives the number of parameters added to the model.

Equation 1. Formulae for the number of parameters added by convolutional layers

1.4 Example

Consider the following snippet of code where 100×95 size ‘rgb’ images are put through two convolutional layers in series.

Putting our formulae from Table 1 and Equation 1 to work, we should expect to get the following output shapes and parameter counts.

O_x O_y # Params
1 ceil((100-3+1)/2)=49 ceil((95-2+1)/1) =94 55 * (3*2*3+1)=1045
2 ceiling(49/3) = 17 ceiling(94/2) = 47 35*(5*2*55+1) =19285

The output below upon running Keras matches with our predictions.

Figure 3. Output shapes and parameter counts in Keras match with predictions

2. Pooling Layer

Pooling layers work hand-in-hand with convolution layers. Their purpose is to reduce the dimensions of the image output by an upstream convolutional layer. They do it by picking a single (average or max for example) from each pooling zone. Pooling zone is a patch of area (much like a filter in the convolutional layer) that moves around the input image as per the settings for strides and padding. Here are key points about pooling layers.

  • Input/Output Shapes: The rules/formulae for computing the output image size (O_x, O_y) are identical to that for convolutional layers. The parameter pooling_size serves the role of kernel_size used in defining the filters for convolutional layers. Also, unless separately specified, strides is taken to be the same as pooling_size. So we simply refer to Table 1.
  • Trainable Parameters: There are no parameters.

All that a pooling layer does is to apply fixed rules for data/shape transformation. Here is the same example as in Section 1.4 but with pooling layer employed.

Running which we get the output image shapes (O_x and O_y) to be identical to what we saw earlier with convolution layers. Pooling occurs in all N_c channels independently so the number of channels are preserved.

Figure 4: Pooling layers modify the image size as per Equations in Table 1. The number of channels in the image are preserved. They add no parameters to the model

3. Visual question answering model

We wrap up this post by putting our formulae to work on the example problem in Keras Guide. Among other layers, the example uses the Dense, Embedding, LSTM, Conv2D and Pooling layers that we have studied in this series. Here is the quote from their description:

This model can select the correct one-word answer when asked a natural-language question about a picture.

It works by encoding the question into a vector, encoding the image into a vector, concatenating the two, and training on top a logistic regression over some vocabulary of potential answers.

Keras Guide

Here is a summary of our formulae including the ones from the previous article. We will refer to it in our verification.

Figure 5. A summary of the formulae for shapes and parameter counts for various layers.

The default settings in Keras for various layers need to be noted.

  • Convolution Layers: strides=(1,1), padding=’valid’
  • Pooling Layers: strides = pool_size, padding=’valid’
  • Dense, LSTM Layers: use_bias = True

The code snippets, the expected shapes/parameter-counts as per our formulae and the actual Keras output upon, and are shown below for each layer in that order.

We note with satisfaction that our predictions for the output shape and the number of trainable parameters matches exactly with Keras gets for each and every layer as the data moves from input to output.

4. Conclusions

In this series we have gone under the hood of some popular layers to see how they twist the incoming data shapes and how many parameters they employ. Understanding these machinations from the fundamentals removes the mystery behind it all and enables us to design efficient models for new applications.

Leave a Reply