第七章神经网络与深度学习

第七章神经网络与深度学习#

Sung Kim hunkim+ml@gmail.com HKUST

Code: hunkim/PyTorchZeroToAll
Slides: http://bit.ly/PyTorchZeroAll
Videos: https://www.bilibili.com/video/av15823922/

The Neuron: A Biological Information Processor

dentrites - the receivers
soma - neuron cell body (sums input signals)
axon - the transmitter
synapse 突触 - point of transmission

Neuron activates after a certain threshold is met.

Learning occurs via electro-chemical changes in effectiveness of synaptic junction.

An Artificial Neuron: The Perceptron simulated on hardware or by software. Learning occurs via changes in value of the connection weights.

input connections - the receivers
node simulates neuron body
output connection - the transmitter
activation function employs a threshold or bias
connection weights act as synaptic junctions (突触)

Neural Networks consist of the following components

An input layer, x
An arbitrary amount of hidden layers
An output layer, ŷ
A set of weights and biases between each layer, W and b
A choice of activation function for each hidden layer, σ.
- e.g., Sigmoid activation function.

Each iteration of the training process consists of the following steps:

Calculating the predicted output ŷ, known as feedforward
Updating the weights and biases, known as backpropagation

activation function for each hidden layer, σ.

https://blog.ttro.com/artificial-intelligence-will-shape-e-learning-for-good/

http://playground.tensorflow.org/

Batch, Iteration, & Epoch#

Batch Size is the total number of training examples present in a single batch.

Note: The number of batches is equal to number of iterations for one epoch. Batch size and number of batches (iterations) are two different things.

Let’s say we have 2000 training examples that we are going to use .

We can divide the dataset of 2000 examples into batches of 500 then it will take 4 iterations to complete 1 epoch.

Where Batch Size is 500 and Iterations is 4, for 1 complete epoch.

Gradient Descent#

Let’s represent parameters as $\Theta$, learning rate as $\alpha$, and gradient as $\bigtriangledown J(\Theta)$,

Mannual Gradient#

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set()

x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

plt.plot(x_data, y_data, 'r-o');

_images/1d8a4e0070770f745fb3f8b603f78f32926ffcd45099e6f5a308272483407ed2.png

# our model for the forward pass
def forward(x):
    return x * w

# Loss function
def loss(y_pred, y_val):
    return (y_pred - y_val) ** 2

# List of weights/Mean square Error (Mse) for each input
w_list = []
mse_list = []

for w in np.arange(0.0, 4.1, 0.1):
    # Print the weights and initialize the lost
    #print("w=", w)
    l_sum = 0
    for x_val, y_val in zip(x_data, y_data):
        # For each input and output, calculate y_hat
        # Compute the total loss and add to the total error
        y_pred = forward(x_val)
        l = loss(y_pred, y_val)
        l_sum += l
        #print("\t", x_val, y_val, y_pred_val, l)
    # Now compute the Mean squared error (mse) of each
    # Aggregate the weight/mse from this run
    #print("MSE=", l_sum / 3)
    w_list.append(w)
    mse_list.append(l_sum / 3)

# Plot it all
plt.plot(w_list, mse_list)
plt.ylabel('Loss')
plt.xlabel('w')
plt.show()

_images/cd676c2390cd0644b3f5503d30da14aae70b9f39ca19153c43ed1b8140d56996.png

# compute gradient
def gradient(x, y):  # d_loss/d_w
    return 2 * x * (x * w - y)

# Training loop
for epoch in range(100):
    for x_val, y_val in zip(x_data, y_data):
        # Compute derivative w.r.t to the learned weights
        # Update the weights
        # Compute the loss and print progress
        grad = gradient(x_val, y_val)
        w = w - 0.01 * grad
        #print("\tgrad: ", x_val, y_val, round(grad, 2))
        y_pred = forward(x_val)
        l = loss(y_pred, y_val)
    print("Epoch:", epoch, "w=", round(w, 2), "loss=", round(l, 2),  end='\r')

Epoch: 0 w= 2.0 loss= 0.0
Epoch: 1 w= 2.0 loss= 0.0
Epoch: 2 w= 2.0 loss= 0.0
Epoch: 3 w= 2.0 loss= 0.0
Epoch: 4 w= 2.0 loss= 0.0
Epoch: 5 w= 2.0 loss= 0.0
Epoch: 6 w= 2.0 loss= 0.0
Epoch: 7 w= 2.0 loss= 0.0
Epoch: 8 w= 2.0 loss= 0.0
Epoch: 9 w= 2.0 loss= 0.0
Epoch: 10 w= 2.0 loss= 0.0
Epoch: 11 w= 2.0 loss= 0.0
Epoch: 12 w= 2.0 loss= 0.0
Epoch: 13 w= 2.0 loss= 0.0
Epoch: 14 w= 2.0 loss= 0.0
Epoch: 15 w= 2.0 loss= 0.0
Epoch: 16 w= 2.0 loss= 0.0
Epoch: 17 w= 2.0 loss= 0.0
Epoch: 18 w= 2.0 loss= 0.0
Epoch: 19 w= 2.0 loss= 0.0
Epoch: 20 w= 2.0 loss= 0.0
Epoch: 21 w= 2.0 loss= 0.0
Epoch: 22 w= 2.0 loss= 0.0
Epoch: 23 w= 2.0 loss= 0.0
Epoch: 24 w= 2.0 loss= 0.0
Epoch: 25 w= 2.0 loss= 0.0
Epoch: 26 w= 2.0 loss= 0.0
Epoch: 27 w= 2.0 loss= 0.0
Epoch: 28 w= 2.0 loss= 0.0
Epoch: 29 w= 2.0 loss= 0.0
Epoch: 30 w= 2.0 loss= 0.0
Epoch: 31 w= 2.0 loss= 0.0
Epoch: 32 w= 2.0 loss= 0.0
Epoch: 33 w= 2.0 loss= 0.0
Epoch: 34 w= 2.0 loss= 0.0
Epoch: 35 w= 2.0 loss= 0.0
Epoch: 36 w= 2.0 loss= 0.0
Epoch: 37 w= 2.0 loss= 0.0
Epoch: 38 w= 2.0 loss= 0.0
Epoch: 39 w= 2.0 loss= 0.0
Epoch: 40 w= 2.0 loss= 0.0
Epoch: 41 w= 2.0 loss= 0.0
Epoch: 42 w= 2.0 loss= 0.0
Epoch: 43 w= 2.0 loss= 0.0
Epoch: 44 w= 2.0 loss= 0.0
Epoch: 45 w= 2.0 loss= 0.0
Epoch: 46 w= 2.0 loss= 0.0
Epoch: 47 w= 2.0 loss= 0.0
Epoch: 48 w= 2.0 loss= 0.0
Epoch: 49 w= 2.0 loss= 0.0
Epoch: 50 w= 2.0 loss= 0.0
Epoch: 51 w= 2.0 loss= 0.0
Epoch: 52 w= 2.0 loss= 0.0
Epoch: 53 w= 2.0 loss= 0.0
Epoch: 54 w= 2.0 loss= 0.0
Epoch: 55 w= 2.0 loss= 0.0
Epoch: 56 w= 2.0 loss= 0.0
Epoch: 57 w= 2.0 loss= 0.0
Epoch: 58 w= 2.0 loss= 0.0
Epoch: 59 w= 2.0 loss= 0.0
Epoch: 60 w= 2.0 loss= 0.0
Epoch: 61 w= 2.0 loss= 0.0
Epoch: 62 w= 2.0 loss= 0.0
Epoch: 63 w= 2.0 loss= 0.0
Epoch: 64 w= 2.0 loss= 0.0
Epoch: 65 w= 2.0 loss= 0.0
Epoch: 66 w= 2.0 loss= 0.0
Epoch: 67 w= 2.0 loss= 0.0
Epoch: 68 w= 2.0 loss= 0.0
Epoch: 69 w= 2.0 loss= 0.0
Epoch: 70 w= 2.0 loss= 0.0
Epoch: 71 w= 2.0 loss= 0.0
Epoch: 72 w= 2.0 loss= 0.0
Epoch: 73 w= 2.0 loss= 0.0
Epoch: 74 w= 2.0 loss= 0.0
Epoch: 75 w= 2.0 loss= 0.0
Epoch: 76 w= 2.0 loss= 0.0
Epoch: 77 w= 2.0 loss= 0.0
Epoch: 78 w= 2.0 loss= 0.0
Epoch: 79 w= 2.0 loss= 0.0
Epoch: 80 w= 2.0 loss= 0.0
Epoch: 81 w= 2.0 loss= 0.0
Epoch: 82 w= 2.0 loss= 0.0
Epoch: 83 w= 2.0 loss= 0.0
Epoch: 84 w= 2.0 loss= 0.0
Epoch: 85 w= 2.0 loss= 0.0
Epoch: 86 w= 2.0 loss= 0.0
Epoch: 87 w= 2.0 loss= 0.0
Epoch: 88 w= 2.0 loss= 0.0
Epoch: 89 w= 2.0 loss= 0.0
Epoch: 90 w= 2.0 loss= 0.0
Epoch: 91 w= 2.0 loss= 0.0
Epoch: 92 w= 2.0 loss= 0.0
Epoch: 93 w= 2.0 loss= 0.0
Epoch: 94 w= 2.0 loss= 0.0
Epoch: 95 w= 2.0 loss= 0.0
Epoch: 96 w= 2.0 loss= 0.0
Epoch: 97 w= 2.0 loss= 0.0
Epoch: 98 w= 2.0 loss= 0.0
Epoch: 99 w= 2.0 loss= 0.0

Auto Gradient#

import torch
w = torch.tensor([1.0], requires_grad=True)

# Training loop
for epoch in range(100):
    for x_val, y_val in zip(x_data, y_data):
        y_pred = forward(x_val) # 1) Forward pass
        l = loss(y_pred, y_val) # 2) Compute loss
        l.backward() # 3) Back propagation to update weights
        #print("\tgrad: ", x_val, y_val, w.grad.item())
        w.data = w.data - 0.01 * w.grad.item()
        # Manually zero the gradients after updating weights
        w.grad.data.zero_()

print(f"Epoch: {epoch} | Loss: {l.item()}")

Epoch: 99 | Loss: 9.094947017729282e-13

w.data

tensor([2.0000])

Back Propagation in Complicated network#

from torch import nn
import torch
from torch import tensor
from torch import sigmoid

x_data = tensor([[1.0], [2.0], [3.0]])
y_data = tensor([[2.0], [4.0], [6.0]])

class Model(nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate two nn.Linear module
        """
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(1, 1)  # One in and one out

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        y_pred = self.linear(x)
        return y_pred

# our model
model = Model()
# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training loop
for k, epoch in enumerate(range(500)):
    # 1) Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # 2) Compute and print loss
    loss = criterion(y_pred, y_data)
    if k%100==0:
        print(f'Epoch: {epoch} | Loss: {loss.item()} ')

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Epoch: 0 | Loss: 80.97074890136719 
Epoch: 100 | Loss: 0.2520463764667511 
Epoch: 200 | Loss: 0.059265460819005966 
Epoch: 300 | Loss: 0.013935551047325134 
Epoch: 400 | Loss: 0.003276776522397995 

# After training
hour_var = tensor([[4.0]])
y_pred = model(hour_var)
print("Prediction (after training)",  4, model(hour_var).data[0][0].item())

Prediction (after training) 4 7.967859268188477

Pytorch Rhythm#

Regression#

Let’s start with a simple example of House Price.

Say you’re helping a friend who wants to buy a house.
She was quoted $400,000 for a 2000 sq ft house (185 meters).

Is this a good price or not?

So you ask your friends who have bought houses in that same neighborhoods, and you end up with three data points:

Area (sq ft) (x)	Price (y)
2,104	399,900
1,600	329,900
2,400	369,000

\[y = f(X) = W X\]

Calculating the prediction is simple multiplication.
But before that, we need to think about the weight we’ll be multiplying by.
“training” a neural network just means finding the weights we use to calculate the prediction.

A simple predictive model (“regression model”)

takes an input,
does a calculation,
and gives an output

Model Evaluation

If we apply our model to the three data points we have, how good of a job would it do?

Loss Function

how bad our prediction is

For each point, the error is measured by the difference between the actual value and the predicted value, raised to the power of 2.
This is called Mean Square Error.

We can’t improve much on the model by varying the weight any more.
But if we add a bias (intercept) we can find values that improve the model.

\[y = 0.1 X + 150\]

Gradient Descent

Automatically get the correct weight and bias values
minimize the loss function.

Regression

%matplotlib inline
import matplotlib.pyplot as plt
import torch
from torch import nn, optim
from torch.autograd import Variable
import numpy as np

x_train = np.array([[2104],[1600],[2400]], dtype=np.float32)
y_train = np.array([[399.900], [329.900], [369.000]], dtype=np.float32)

plt.plot(x_train, y_train, 'r.')
plt.show()

_images/41df39040a1958195af8775efab0bedb783bd277b1f073a4c84da9defc9e8bd7.png

x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)

nn.Linear

help(nn.Linear)

Applies a linear transformation to the incoming data: $y = xA^T + b$

in_features: size of each input sample
out_features: size of each output sample
bias: If set to False, the layer will not learn an additive bias. Default: True

# Linear Regression Model
class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
        self.linear = nn.Linear(1, 1)  # input and output is 1 dimension

    def forward(self, x):
        out = self.linear(x)
        return out

model = LinearRegression()

# Define Loss and Optimizatioin function
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=1e-9)#1e-4)

help(nn.MSELoss)

To measures the mean squared error (squared L2 norm) between each element in the input x and target y.

help(optim.SGD)

Implements stochastic gradient descent (optionally with momentum).

Momentum is a variation on stochastic gradient descent that takes previous updates into account as well and generally leads to faster training.

num_epochs = 1000
for epoch in range(num_epochs):
    inputs = Variable(x_train)
    target = Variable(y_train) 
    # forward
    out = model(inputs)
    loss = criterion(out, target)
    # backward
    optimizer.zero_grad() # Clears the gradients of all optimized
    loss.backward()
    optimizer.step() # Performs a single optimization step.

    if (epoch+1) % 50 == 0:
        print('Epoch[{}/{}], loss: {:.6f}'
              .format(epoch+1, num_epochs, loss.data.item()))

Epoch[50/1000], loss: 524703.812500
Epoch[100/1000], loss: 224658.125000
Epoch[150/1000], loss: 96851.820312
Epoch[200/1000], loss: 42411.964844
Epoch[250/1000], loss: 19222.966797
Epoch[300/1000], loss: 9345.485352
Epoch[350/1000], loss: 5138.111816
Epoch[400/1000], loss: 3345.956299
Epoch[450/1000], loss: 2582.575439
Epoch[500/1000], loss: 2257.412354
Epoch[550/1000], loss: 2118.905518
Epoch[600/1000], loss: 2059.905518
Epoch[650/1000], loss: 2034.777222
Epoch[700/1000], loss: 2024.072266
Epoch[750/1000], loss: 2019.512207
Epoch[800/1000], loss: 2017.569946
Epoch[850/1000], loss: 2016.742554
Epoch[900/1000], loss: 2016.390259
Epoch[950/1000], loss: 2016.241577
Epoch[1000/1000], loss: 2016.177734

predict = model(Variable(x_train))
predict = predict.data.numpy()
plt.plot(x_train.numpy(), y_train.numpy(), 'ro', label='Original data')
plt.plot(x_train.numpy(), predict, 'b-s', label='Fitting Line')
plt.xlabel('X', fontsize= 20)
plt.ylabel('y', fontsize= 20)
plt.legend( fontsize= 20)
plt.show()

_images/6a51ae16dda415acddb802d1f80d1649549810b074650d675579ac19366b44cc.png

Have a try#

x_train = np.array([[3.3], [4.4], [5.5], [6.71], [6.93], [4.168],
                    [9.779], [6.182], [7.59], [2.167], [7.042],
                    [10.791], [5.313], [7.997], [3.1]], dtype=np.float32)

y_train = np.array([[1.7], [2.76], [2.09], [3.19], [1.694], [1.573],
                    [3.366], [2.596], [2.53], [1.221], [2.827],
                    [3.465], [1.65], [2.904], [1.3]], dtype=np.float32)

Classification#

_images/softmax-regression-scalargraph.png

Activation Function#

def sigmoid(x):
    return 1/(1 + np.exp(-x))

plt.plot(range(-10, 10), [sigmoid(i) for i in range(-10, 10)])
plt.xlabel('x', fontsize = 20)
plt.ylabel('sigmoid', fontsize = 20);

_images/a41768776e2c5fb55f47456c171e1531a04b1d8f6192a7bebc194732146bc137.png

# Naive scalar relu implementation. 
# In the real world, most calculations are done on vectors
def relu(x):
    if x < 0:
        return 0
    else:
        return x


plt.plot(range(-10, 10), [relu(i) for i in range(-10, 10)])
plt.xlabel('x', fontsize = 20)
plt.ylabel('relu', fontsize = 20);

_images/660b822bae6915b01ced3a147b470d671dea341a31af25aa9e92adf7a1c2e04d.png

Softmax#

The softmax function, also known as softargmax or normalized exponential function, is a function that takes as input a vector of K real numbers, and normalizes it into a probability distribution consisting of K probabilities.

\[softmax = \frac{e^x}{\sum e^x}\]

def softmax(s):
    return np.exp(s) / np.sum(np.exp(s), axis=0)

softmax([1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0])

array([0.02364054, 0.06426166, 0.1746813 , 0.474833  , 0.02364054,
       0.06426166, 0.1746813 ])

plt.plot(range(10), softmax(range(10)))
plt.xlabel('x', fontsize = 20)
plt.ylabel('softmax', fontsize = 20);

_images/df43b4e8dac38e3873c0de28368907333682b9a1acff53012d2f50131b1fbbf7.png

Softmax is often used in neural networks, to map the non-normalized output of a network to a probability distribution over predicted output classes.

Prior to applying softmax, some vector components could be negative, or greater than one; and might not sum to 1;
After applying softmax, each component will be in the interval (0,1), and the components will add up to 1, so that they can be interpreted as probabilities. Furthermore, the larger input components will correspond to larger probabilities.

Logistic Regression#

from torch import tensor
from torch import nn
from torch import sigmoid
import torch.nn.functional as F
import torch.optim as optim

# Training data and ground truth
x_data = tensor([[1.0], [2.0], [3.0], [4.0]])
y_data = tensor([[0.], [0.], [1.], [1.]])

class Model(nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate nn.Linear module
        """
        super(Model, self).__init__()
        self.linear = nn.Linear(1, 1)  # One in and one out

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data.
        """
        y_pred = sigmoid(self.linear(x))
        return y_pred

# our model
model = Model()

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = nn.BCELoss(reduction='mean')
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
for k, epoch in enumerate(range(1000)):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # Compute and print loss
    loss = criterion(y_pred, y_data)
    if k%100==0:
        print(f'Epoch {epoch + 1}/1000 | Loss: {loss.item():.4f}')

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Epoch 1/1000 | Loss: 0.3840
Epoch 101/1000 | Loss: 0.3748
Epoch 201/1000 | Loss: 0.3661
Epoch 301/1000 | Loss: 0.3579
Epoch 401/1000 | Loss: 0.3501
Epoch 501/1000 | Loss: 0.3427
Epoch 601/1000 | Loss: 0.3357
Epoch 701/1000 | Loss: 0.3290
Epoch 801/1000 | Loss: 0.3226
Epoch 901/1000 | Loss: 0.3166

# After training
print(f'Let\'s predict the hours need to score above 50%\n{"=" * 50}')
y_pred = model(tensor([[1.0]]))
print(f'Prediction for x = 1.0, y_pred = {y_pred.item():.4f} | Above 50%: {y_pred.item() > 0.5}')
y_pred = model(tensor([[7.0]]))
print(f'Prediction for x = 7.0, y_pred = {y_pred.item():.4f} | Above 50%: { y_pred.item() > 0.5}')

Let's predict the hours need to score above 50%
==================================================
Prediction for x = 1.0, y_pred = 0.1998 | Above 50%: False
Prediction for x = 7.0, y_pred = 0.9969 | Above 50%: True

Diabetes Classification#

from torch import nn, optim, from_numpy
import numpy as np

xy = np.loadtxt('../data/diabetes.csv.gz', delimiter=',', dtype=np.float32)
x_data = from_numpy(xy[:, 0:-1])
y_data = from_numpy(xy[:, [-1]])
print(f'X\'s shape: {x_data.shape} | Y\'s shape: {y_data.shape}')

X's shape: torch.Size([759, 8]) | Y's shape: torch.Size([759, 1])

class Model(nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate two nn.Linear module
        """
        super(Model, self).__init__()
        self.l1 = nn.Linear(8, 6)
        self.l2 = nn.Linear(6, 4)
        self.l3 = nn.Linear(4, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        x = self.sigmoid(self.l1(x))
        x = self.sigmoid(self.l2(x))
        y_pred = self.sigmoid(self.l3(x))
        return y_pred

# our model
model = Model()


# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = nn.BCELoss(reduction='mean')
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Training loop
for k, epoch in enumerate(range(1000)):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # Compute and print loss
    loss = criterion(y_pred, y_data)
    if k % 200 ==0:
        print(f'Epoch: {epoch + 1}/1000 | Loss: {loss.item():.4f}')

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
print(f'Epoch: {epoch + 1}/1000 | Loss: {loss.item():.4f}')

Epoch: 1/1000 | Loss: 0.6444
Epoch: 201/1000 | Loss: 0.6441
Epoch: 401/1000 | Loss: 0.6437
Epoch: 601/1000 | Loss: 0.6431
Epoch: 801/1000 | Loss: 0.6424
Epoch: 1000/1000 | Loss: 0.6413

The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size. http://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5) # in_channels = 3, out_channels = 6, kernel_size= 5
        self.pool = nn.MaxPool2d(2, 2) # pool of square window of size = 2, stride = 2
        self.conv2 = nn.Conv2d(6, 16, 5) # in_channels = 6, out_channels = 16, kernel_size= 5
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # in_features = 16*5*5, out_features = 120
        self.fc2 = nn.Linear(120, 84) # in_features = 120, out_features = 84
        self.fc3 = nn.Linear(84, 10)  # in_features = 84, out_features = 10

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5) # Flatten the data (n, 16, 5, 5)-> (n, 400)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Run in Google Colab

https://colab.research.google.com/github/pytorch/tutorials/blob/gh-pages/_downloads/cifar10_tutorial.ipynb

深度学习 Deep Learning 视频系列 https://space.bilibili.com/88461692/channel/detail?cid=26587

第七章 神经网络与深度学习

Contents

第七章 神经网络与深度学习#