PyTorch入門②：Neural Network - 機械学習・自然言語処理の勉強メモ

ネットワークはtorch.nnパッケージを使用して構築する。

今回は下記にあるサンプルコードを使う。
Learning PyTorch with Examples — PyTorch Tutorials 0.3.0.post4 documentation

ネットワークは「入力層ー中間層ー出力層」の三層構造で線型回帰問題を想定。

import torch
from torch.autograd import Variable

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

ネットワークがシーケンシャルな構造の場合、nn.Sequentialモジュールが使える。
KerasのSequentialと同じような使い方だと思われる。

シーケンス処理以外のネットワークや独自でクラスを定義したい場合、以下のようにしてクラスをつくる。

# linear regression model
class MLP(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(MLP, self).__init__()
        self.linear1 = nn.Linear(D_in, H)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(H, D_out)

    def forward(self, x):
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        return out

呼ぶときはこのように呼ぶ。

model = MLP(D_in, H, D_out)

forward関数は、modelにデータを与えた際に呼ばれる。

また、モデル内のLinearモジュールで線形関数を使って出力を計算してくれる。
※Linearモジュールについて
$y=Wx+b$ の線形変換を行ってくれる。
torch.nn — PyTorch master documentation

class torch.nn.Linear(in_features, out_features, bias=True)

（デフォルトでバイアス付）

例）以下のように線形変換を行う。

import torch
import torch.nn as nn
from torch.autograd import Variable

m = nn.Linear(20, 30)
input = Variable(torch.randn(128, 20))
output = m(input)
print(output.size())  # torch.Size([128, 30])

内積をとる行列のサイズが不正な場合、

m = nn.Linear(20, 30)
input = Variable(torch.randn(128, 10))
output = m(input)
# RuntimeError: size mismatch, m1: [128 x 10], m2: [20 x 30] at

のようなエラーを出力する。

Kerasライクにも書ける。

layer = []
layer.append(torch.nn.Linear(D_in, H))
layer.append(torch.nn.ReLU())
layer.append(torch.nn.Linear(H, D_out))
model = nn.Sequential(*layer)

次に損失関数の定義。

loss_fn = torch.nn.MSELoss(size_average=False)

パラメータ最適化

learning_rate = 1e-4
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss. 
    loss = loss_fn(y_pred, y)
    print(t, loss.data[0])

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Variable, so
    # we can access its data and gradients like we did before.
    for param in model.parameters():
        param.data -= learning_rate * param.grad.data

zero_grad()は「勾配の初期化」のような意味合いかと思われる。

torch.optimを使う場合、このように書く。

learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss. 
    loss = loss_fn(y_pred, y)
    print(t, loss.data[0])

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model.
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()