Pytorch：ライブラリの誤差関数の構造を理解する - 機械学習・自然言語処理の勉強メモ

はじめに

今まで当たり前のように誤差関数を使っていた。
既に用意されたものであればそれで問題ない。

しかし、誤差関数を自作したいと思った場合、
ライブラリの誤差関数の構造を理解している必要がある。

そんなわけでライブラリの誤差関数について調べたのでメモ。

簡単な復習

簡単に使い方を復習する。

ライブラリの誤差関数を利用する場合、以下のような使い方をする。

import torch
import torch.nn as nn
import torch.nn.functional as F

net = Net() 
outputs = net(inputs)

criterion = nn.MSELoss()

loss = criterion(outputs, targets)
loss.backward()

ネットワーク

今回はシンプルな以下のネットワークを考える。

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(32, 16)
        self.fc2 = nn.Linear(16, 1)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

パラメータは以下の通り

params = list(net.parameters())
print(params[0].size())
torch.Size([16, 32])

print(params[1].size())
torch.Size([16])

print(params[1])
Parameter containing:
tensor([ 0.0982,  0.0495,  0.1656, -0.1646,  0.1014, -0.0163, -0.0873,
         0.0418, -0.0404,  0.1556, -0.1247,  0.0236, -0.0651,  0.0960,
         0.1342,  0.1203])

入力

以下のように入力データを作成する。

torch.manual_seed(1)

inputs = torch.randn(10, 32)
targets = torch.randn(10)
targets = targets.view(1, -1)
print(targets)
tensor([[ 0.1924,  0.7161, -0.8120, -1.4617,  0.2328,  0.1896, -0.2204,
          0.1491,  0.0100, -0.1243]])

出力

出力結果は以下の通り

outputs = net(inputs)
outputs = outputs.view(1, -1)
print(outputs)
tensor([[-0.2504,  0.0378,  0.1485,  0.1909, -0.1019,  0.2417,  0.1653,
          0.1359, -0.1187,  0.0876]])

誤差計算

まず、誤差関数のインスタンスを生成する。

criterion = nn.MSELoss()

次に出力結果と真の値を誤差関数の入力として、誤差を求める。

outputs = outputs.view(1, -1)
loss = criterion(outputs, targets)
print(loss)
tensor(0.4635)

Backprop

求めた誤差からパラメータの勾配を計算する。

net.zero_grad()

print(net.fc2.weight.grad)
> None
loss.backward()
print(net.fc2.weight.grad)
tensor([[ 0.0611,  0.0976, -0.2338,  0.1976,  0.0020, -0.0788, -0.1145,
         -0.1976,  0.0699, -0.3314,  0.2379, -0.0534,  0.0729, -0.0008,
         -0.2456, -0.2657]])

はじめは勾配がNoneとなっているが、loss.backward()により、勾配ベクトルが計算されていることが分かる。

パラメータ更新（最適化）
今回は確率的勾配降下法（SGD）を用いた。

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

sub_でその値から引数の値を引き算する。

更新後の誤差を確認する。

outputs = net(inputs)
outputs = outputs.view(1, -1)

loss = criterion(outputs, targets)
print(loss)
tensor(0.3956)

誤差が減少していることが確認できた。

実際は多くの最適化関数も用意されているのでそちらを利用する方が良い。

import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.01)
optimizer.step()

参考：
Neural Networks — PyTorch Tutorials 0.4.1 documentation

基本的なことを復習したところで、もう少し深く理解する。

nn.MSELoss()

ソースを覗いてみる。
torch.nn.modules.loss — PyTorch master documentation

class MSELoss(_Loss):
    def __init__(self, size_average=None, reduce=None, reduction='elementwise_mean'):
        super(MSELoss, self).__init__(size_average, reduce, reduction)

    def forward(self, input, target):
        return F.mse_loss(input, target, reduction=self.reduction)

※長かったのでコメントは省略
実装部分は意外とシンプルでコンストラクタとforward関数のみのクラス。
他の誤差関数も同じようなクラス構造。

F.mse_loss自体が実際に平均二乗誤差を計算している関数。

このクラスが継承している_Lossとは何か。

class _Loss(Module):
    def __init__(self, size_average=None, reduce=None, reduction='elementwise_mean'):
        super(_Loss, self).__init__()
        if size_average is not None or reduce is not None:
            self.reduction = _Reduction.legacy_get_string(size_average, reduce)
        else:
            self.reduction = reduction

このクラス自体はModuleを継承している。

てな感じでみていくと誤差関数のクラスは以下の構造でOKと思われる。

class LossFunction(nn.Module):

def __init__(self):
    super(LossFunction, self).__init__()

def forward(self, inputs, targets):
   loss = function(inputs, targets)
   return loss

ただし、必ずクラス構造である必要はなく、誤差関数だけでも良い。
実際に以下でも正しくパラメータ更新ができた。

loss = F.mse_loss(outputs, targets)