VGG

CNN that won the second place of ILSVRC 2014, it explores the benefits of deep and regular architectures based on a few simple design choices:

  • conv layers with

  • max-pooling,

  • channels (and filters) double after every pool

The architecture is designed as a repetition of stages where a single stage is a chain of layers that process activations at the same spatial resolution (conv-conv-pool, conv-conv-conv-pool and conv-conv-conv-conv-pool).

A stage has the same receptive field as a single larger convolution but, given the same number of input/output channels, introduces more non-linearities and requires less parameters and less computation. A stage requires more memory to store the activations, though.

TRAINING PHASE

HYPERPARAMETERVALUE
Optimizer,Learning rate,weight decay, normalizationsame as ALEXNET but with
Epochs
Data AugmentationSame as ALEXNET plus Scale Jittering (randomly rescale the input image to , with in
InitializationDeep nets are hard to train with randomly initialized weights due to instability of gradients. They train a VGG-11 with Weights . Then train VGG-16 and VGG-19 by initializing the first 4 conv layers and the last 3 FC layers with the pre-trained weights of the corresponding layers of VGG-11.

PREVIOUS NEXT