{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# VGG16预训练模型\n",
    "\n",
    "*Author: Pytorch Team*\n",
    "\n",
    "**Award winning ConvNets from 2014 Imagenet ILSVRC challenge**\n",
    "\n",
    "<img src=\"https://pytorch.org/assets/images/vgg.png\" alt=\"alt\" width=\"50%\"/>\n",
    "\n",
    "\n",
    "https://pytorch.org/hub/pytorch_vision_vgg/\n",
    "\n",
    "https://pytorch.org/docs/stable/torchvision/models.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run it in colab\n",
    "\n",
    "https://colab.research.google.com/drive/1epVRmNLeoAenypwM1ffGeHv9pk1xtEek\n",
    "\n",
    "This notebook is optionally accelerated with a GPU runtime.\n",
    "\n",
    "If you would like to use this acceleration, please  \n",
    "- select the menu option \"Runtime\" -> \"Change runtime type\", \n",
    "- select \"Hardware Accelerator\" -> \"GPU\" and click \"SAVE\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Pretrained Models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "start_time": "2021-07-23T12:32:02.893Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Downloading: \"https://github.com/pytorch/vision/archive/v0.6.0.zip\" to /Users/datalab/.cache/torch/hub/v0.6.0.zip\n",
      "Downloading: \"https://download.pytorch.org/models/vgg16-397923af.pth\" to /Users/datalab/.cache/torch/hub/checkpoints/vgg16-397923af.pth\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f7d7289bf8db4e6fb3ce17bf9b0901f4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, max=553433881.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import torch\n",
    "model = torch.hub.load('pytorch/vision:v0.6.0', 'vgg16', pretrained=True)\n",
    "# or any of these variants\n",
    "# model = torch.hub.load('pytorch/vision:v0.6.0', 'vgg11_bn', pretrained=True)\n",
    "# model = torch.hub.load('pytorch/vision:v0.6.0', 'vgg13', pretrained=True)\n",
    "# model = torch.hub.load('pytorch/vision:v0.6.0', 'vgg13_bn', pretrained=True)\n",
    "# model = torch.hub.load('pytorch/vision:v0.6.0', 'vgg16', pretrained=True)\n",
    "# model = torch.hub.load('pytorch/vision:v0.6.0', 'vgg16_bn', pretrained=True)\n",
    "# model = torch.hub.load('pytorch/vision:v0.6.0', 'vgg19', pretrained=True)\n",
    "# model = torch.hub.load('pytorch/vision:v0.6.0', 'vgg19_bn', pretrained=True)\n",
    "model.eval()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-04T08:55:55.405309Z",
     "start_time": "2020-12-04T08:55:55.386578Z"
    }
   },
   "source": [
    "- Downloading: \"https://github.com/pytorch/vision/archive/v0.6.0.zip\" to /root/.cache/torch/hub/v0.6.0.zip\n",
    "- Downloading: \"https://download.pytorch.org/models/vgg16-397923af.pth\" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth\n",
    "\n",
    "```\n",
    "100% 528M/528M [00:02<00:00, 223MB/s]\n",
    "```\n",
    "\n",
    "```\n",
    "VGG(\n",
    "  (features): Sequential(\n",
    "    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
    "    (1): ReLU(inplace=True)\n",
    "    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
    "    (3): ReLU(inplace=True)\n",
    "    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
    "    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
    "    (6): ReLU(inplace=True)\n",
    "    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
    "    (8): ReLU(inplace=True)\n",
    "    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
    "    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
    "    (11): ReLU(inplace=True)\n",
    "    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
    "    (13): ReLU(inplace=True)\n",
    "    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
    "    (15): ReLU(inplace=True)\n",
    "    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
    "    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
    "    (18): ReLU(inplace=True)\n",
    "    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
    "    (20): ReLU(inplace=True)\n",
    "    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
    "    (22): ReLU(inplace=True)\n",
    "    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
    "    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
    "    (25): ReLU(inplace=True)\n",
    "    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
    "    (27): ReLU(inplace=True)\n",
    "    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
    "    (29): ReLU(inplace=True)\n",
    "    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
    "  )\n",
    "  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))\n",
    "  (classifier): Sequential(\n",
    "    (0): Linear(in_features=25088, out_features=4096, bias=True)\n",
    "    (1): ReLU(inplace=True)\n",
    "    (2): Dropout(p=0.5, inplace=False)\n",
    "    (3): Linear(in_features=4096, out_features=4096, bias=True)\n",
    "    (4): ReLU(inplace=True)\n",
    "    (5): Dropout(p=0.5, inplace=False)\n",
    "    (6): Linear(in_features=4096, out_features=1000, bias=True)\n",
    "  )\n",
    ")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All pre-trained models expect input images normalized in the same way,\n",
    "i.e. mini-batches of 3-channel RGB images of shape `(3 x H x W)`, where `H` and `W` are expected to be at least `224`.\n",
    "\n",
    "The images have to be loaded in to a range of `[0, 1]` and then normalized using `mean = [0.485, 0.456, 0.406]`\n",
    "and `std = [0.229, 0.224, 0.225]`.\n",
    "\n",
    "Here's a sample execution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download an example image from the pytorch website\n",
    "import urllib\n",
    "url, filename = (\"https://github.com/pytorch/hub/raw/master/images/dog.jpg\", \"dog.jpg\")\n",
    "try: urllib.URLopener().retrieve(url, filename)\n",
    "except: urllib.request.urlretrieve(url, filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "https://github.com/pytorch/hub/raw/master/images/dog.jpg"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from PIL import Image\n",
    "input_image = Image.open(filename)\n",
    "input_image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sample execution (requires torchvision)\n",
    "from torchvision import transforms\n",
    "\n",
    "preprocess = transforms.Compose([\n",
    "    transforms.Resize(256),\n",
    "    transforms.CenterCrop(224),\n",
    "    transforms.ToTensor(),\n",
    "    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n",
    "])\n",
    "input_tensor = preprocess(input_image)\n",
    "input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model\n",
    "\n",
    "# move the input and model to GPU for speed if available\n",
    "if torch.cuda.is_available():\n",
    "    input_batch = input_batch.to('cuda')\n",
    "    model.to('cuda')\n",
    "\n",
    "with torch.no_grad():\n",
    "    output = model(input_batch)\n",
    "# Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes\n",
    "print(output[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The output has unnormalized scores. To get probabilities, you can run a softmax on it.\n",
    "\n",
    "input_prob  = torch.nn.functional.softmax(output[0], dim=0)\n",
    "torch.argmax(input_prob)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "tensor(258, device='cuda:0')\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-07-23T12:37:11.364843Z",
     "start_time": "2021-07-23T12:37:09.577544Z"
    }
   },
   "outputs": [],
   "source": [
    "# ImageNet挑战使用了一个“修剪”的1000个非重叠类的列表\n",
    "import pandas as pd\n",
    "\n",
    "url = 'https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json'\n",
    "imagenet_df = pd.read_json(url).T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-07-23T12:37:18.548085Z",
     "start_time": "2021-07-23T12:37:18.530787Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>n01440764</td>\n",
       "      <td>tench</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>n01443537</td>\n",
       "      <td>goldfish</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>n01484850</td>\n",
       "      <td>great_white_shark</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>n01491361</td>\n",
       "      <td>tiger_shark</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>n01494475</td>\n",
       "      <td>hammerhead</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>995</th>\n",
       "      <td>n13044778</td>\n",
       "      <td>earthstar</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>996</th>\n",
       "      <td>n13052670</td>\n",
       "      <td>hen-of-the-woods</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>997</th>\n",
       "      <td>n13054560</td>\n",
       "      <td>bolete</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>998</th>\n",
       "      <td>n13133613</td>\n",
       "      <td>ear</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>999</th>\n",
       "      <td>n15075141</td>\n",
       "      <td>toilet_tissue</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1000 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             0                  1\n",
       "0    n01440764              tench\n",
       "1    n01443537           goldfish\n",
       "2    n01484850  great_white_shark\n",
       "3    n01491361        tiger_shark\n",
       "4    n01494475         hammerhead\n",
       "..         ...                ...\n",
       "995  n13044778          earthstar\n",
       "996  n13052670   hen-of-the-woods\n",
       "997  n13054560             bolete\n",
       "998  n13133613                ear\n",
       "999  n15075141      toilet_tissue\n",
       "\n",
       "[1000 rows x 2 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "imagenet_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "imagenet_df.iloc[int(torch.argmax(input_prob))]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-12-04T08:57:43.312761Z",
     "start_time": "2020-12-04T08:57:43.308115Z"
    }
   },
   "source": [
    "```\n",
    "0    n02111889\n",
    "1      Samoyed\n",
    "Name: 258, dtype: object\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Description\n",
    "\n",
    "Here we have implementations for the models proposed in [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556),\n",
    "for each configurations and their with bachnorm version.\n",
    "\n",
    "For example, configuration `A` presented in the paper is `vgg11`, configuration `B` is `vgg13`, configuration `D` is `vgg16`\n",
    "and configuration `E` is `vgg19`. Their batchnorm version are suffixed with `_bn`.\n",
    "\n",
    "Their 1-crop error rates on imagenet dataset with pretrained models are listed below.\n",
    "\n",
    "| Model structure | Top-1 error | Top-5 error |\n",
    "| --------------- | ----------- | ----------- |\n",
    "|  vgg11          | 30.98       | 11.37       |\n",
    "|  vgg11_bn       | 26.70       | 8.58        |\n",
    "|  vgg13          | 30.07       | 10.75       |\n",
    "|  vgg13_bn       | 28.45       | 9.63        |\n",
    "|  vgg16          | 28.41       | 9.62        |\n",
    "|  vgg16_bn       | 26.63       | 8.50        |\n",
    "|  vgg19          | 27.62       | 9.12        |\n",
    "|  vgg19_bn       | 25.76       | 8.15        |\n",
    "\n",
    "**References**\n",
    "\n",
    "- [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}