Bonky Zhu
If someone is able to show me that what I think or do is not right, I will happily change, for I seek the truth, by which no one was ever truly harmed. It is the person who continues in his self-deception and ignorance who is harmed.

Pytorch 的神经网络

我们可以使用torch.nn包来构建神经网络。 一个nn.Module包含各个层和一个forward(input)方法,该方法返回output

Image result for neural network

神经网络的典型训练过程如下:

  1. 定义包含一些可学习的参数(或者叫权重)神经网络模型;
  2. 在数据集上迭代;
  3. 通过神经网络处理输入;
  4. 计算损失(输出结果和正确值的差值大小);
  5. 将梯度反向传播回网络的参数;
  6. 更新网络的参数,主要使用如下简单的更新原则: weight = weight - learning_rate * gradient

定义网络

img

以上图为例,上图为手写数字识别经典的算法LeNet-5 ,下面是他的实现:(推荐阅读:具体原理和网络参数的解释请见:网络解析(一):LeNet-5 详解)

import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)


代码输出如下:

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


在模型中必须要定义 forward 函数,backward 函数(用来计算梯度)会被autograd自动创建。 可以在 forward 函数中使用任何针对 Tensor 的操作。一般来说自定义自己网络的时候需要注意:

  1. 一般把网络中具有可学习参数的层(如全连接层、卷积层等)放在构造函数__init__()中,当然我也可以吧不具有参数的层也放在里面;
  2. 一般把不具有可学习参数的层(如ReLUdropoutBatchNormanation层)可放在构造函数中,也可不放在构造函数中,如果不放在构造函数init里面,则在forward方法里面可以使用nn.functional来代替
  3. forward方法是必须要重写的,它是实现模型的功能,实现各个层之间的连接关系的核心。

net.parameters()返回可被学习的参数(权重)列表和值的 generator

params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight


将所有参数的梯度缓存清零,然后进行随机梯度的的反向传播:

net.zero_grad()
out.backward(torch.randn(1, 10))


更多的神经Layer可见 → TORCH.NN

更多激活函数可见 → TORCH.NN.FUNCTIONAL

“屏幕快照 2019-08-31 下午4.51.29”的副本

损失函数

一个损失函数接受一对 (output, target) 作为输入(output为网络的输出,target为实际值),计算一个值来估计网络的输出和目标值相差多少。

nn包中有很多不同的损失函数nn.MSELoss是一个比较简单的损失函数,它计算输出和目标间的均方误差, 例如:

output = net(input)
target = torch.randn(10)  # 随机值作为样例
target = target.view(1, -1)  # 使target和output的shape相同
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)


输出结果如下

tensor(0.8109, grad_fn=)

更多的损失函数可见 → TORCH.NN

传播误差

调用loss.backward()获得反向传播的误差。

但是在调用前需要清除已存在的梯度,否则梯度将被累加到已存在的梯度。

现在,我们将调用loss.backward(),并查看conv1层的偏差(bias)项在反向传播前后的梯度。

net.zero_grad()     # 清除梯度
print(net.conv1.bias.grad)

loss.backward()

print(net.conv1.bias.grad)


返回结果如下:

tensor([0., 0., 0., 0., 0., 0.])
tensor([ 0.0051,  0.0042,  0.0026,  0.0152, -0.0040, -0.0036])


更新权重

在实践中最简单的权重更新规则是随机梯度下降(SGD)

 ``weight = weight - learning_rate * gradient``



PyTorch中构建了一个包torch.optim实现了所有的这些规则。 使用它们非常简单:

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update



更多的优化算法方法可见 → TORCH.OPTIM

附:完整代码

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.autograd
from torchviz import make_dot

class Net(nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.conv1 = nn.Conv2d(1, 6, 5)
    self.conv2 = nn.Conv2d(6, 16, 5)
    self.fc1 = nn.Linear(16*13*13, 120)
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)

  def forward(self, x):
    x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
    x = F.max_pool2d(F.relu(self.conv2(x)), (2,2))
    x = x.view(-1, self.num_flat_features(x))
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

  def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()
input = torch.rand(1,1,64,64)
target = torch.ones(10)
criterion = nn.MSELoss()

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
for i in range(100):
  optimizer.zero_grad()   # zero the gradient buffers
  output = net(input)
  loss = criterion(output, target)
  loss.backward()
  optimizer.step()    # Does the update
  if i % 10 == 0:
    print("Loss is {}, output is {}".format(loss, output.data))



训练结果如下;

Loss is 0.972576916217804, output is tensor([[ 0.0904,  0.0376,  0.0494, -0.0319,  0.1387, -0.0710, -0.0473,  0.0095,
         -0.0053, -0.0132]])
Loss is 0.8258872032165527, output is tensor([[0.1892, 0.1113, 0.1002, 0.0492, 0.2139, 0.0476, 0.0409, 0.0710, 0.0356,
         0.0727]])
......
Loss is 1.2406354699123767e-07, output is tensor([[1.0001, 1.0006, 1.0003, 0.9998, 1.0003, 1.0002, 0.9999, 0.9994, 0.9995,
         0.9997]])
Loss is 1.0122771954002019e-08, output is tensor([[1.0000, 1.0002, 1.0001, 0.9999, 1.0001, 1.0001, 1.0000, 0.9999, 0.9999,
         0.9999]])



我们可以发现,Loss 越来越小,从接近1到数量积只有10^-8^。然后 output 也越来越接近 target 全为1

Share

You may also like...

发表评论