VGG
CNN that won the second place of ILSVRC 2014, it explores the benefits of deep and regular architectures based on a few simple design choices:
- 
conv layers with 
- 
max-pooling, 
- 
channels (and filters) double after every pool 
The architecture is designed as a repetition of stages where a single stage is a chain of layers that process activations at the same spatial resolution (conv-conv-pool, conv-conv-conv-pool and conv-conv-conv-conv-pool).
A stage has the same receptive field as a single larger convolution but, given the same number of input/output channels, introduces more non-linearities and requires less parameters and less computation. A stage requires more memory to store the activations, though.
TRAINING PHASE
| HYPERPARAMETER | VALUE | 
|---|---|
| Optimizer,Learning rate,weight decay, normalization | same as ALEXNET but with | 
| Epochs | |
| Data Augmentation | Same as ALEXNET plus Scale Jittering (randomly rescale the input image to , with in | 
| Initialization | Deep nets are hard to train with randomly initialized weights due to instability of gradients. They train a VGG-11 with Weights . Then train VGG-16 and VGG-19 by initializing the first 4 conv layers and the last 3 FC layers with the pre-trained weights of the corresponding layers of VGG-11. |