VGG
CNN that won the second place of ILSVRC 2014, it explores the benefits of deep and regular architectures based on a few simple design choices:
-
conv layers with
-
max-pooling,
-
channels (and filters) double after every pool
The architecture is designed as a repetition of stages where a single stage is a chain of layers that process activations at the same spatial resolution (conv-conv-pool, conv-conv-conv-pool and conv-conv-conv-conv-pool).
A stage has the same receptive field as a single larger convolution but, given the same number of input/output channels, introduces more non-linearities and requires less parameters and less computation. A stage requires more memory to store the activations, though.
TRAINING PHASE
HYPERPARAMETER | VALUE |
---|---|
Optimizer,Learning rate,weight decay, normalization | same as ALEXNET but with |
Epochs | |
Data Augmentation | Same as ALEXNET plus Scale Jittering (randomly rescale the input image to , with in |
Initialization | Deep nets are hard to train with randomly initialized weights due to instability of gradients. They train a VGG-11 with Weights . Then train VGG-16 and VGG-19 by initializing the first 4 conv layers and the last 3 FC layers with the pre-trained weights of the corresponding layers of VGG-11. |