TheanoでLogistic Regression（ロジスティック回帰） - 機械学習・自然言語処理の勉強メモ

下記のdocumentationの整理を行う。

Classifying MNIST digits using Logistic Regression — DeepLearning 0.1 documentation

LogisticRegressionインスタンスの生成

例題の入力は28×28のMNIST画像。
出力は1～10までの数値。（10クラス分類）

# construct the logistic regression class
# Each MNIST image has size 28*28
classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10)

LogisticRegressionコンストラクタ

やっていることは、パラメータの初期設定。
*Theanoでは、レイヤークラスを定義して、コンストラクタでモデルの定義を行う。
（Kerasでのmodel.addのイメージ。）
これは以降に出現する複雑なモデルでも共通する考え方。

self.W = theano.shared(
    value=numpy.zeros(
        (n_in, n_out),
        dtype=theano.config.floatX
    ),
    name='W',
    borrow=True
)
# initialize the biases b as a vector of n_out 0s
self.b = theano.shared(
    value=numpy.zeros(
        (n_out,),
        dtype=theano.config.floatX
    ),
    name='b',
    borrow=True
)

borrowの復習
引数 borrow は Python 空間上で定義されたデータの実体を共有変数でも共有するかどうかを決める。

# Python 空間で変数定義
x = numpy.array([1, 1, 1])
# theano.shared で 共有変数化
xs = theano.shared(x)

x[1] = 2
x
# array([1, 2, 1])
# デフォルトでは 対象オブジェクトのコピーが 共有変数の実体になるため、
# Python 空間の変数を変更しても 共有変数の実体には影響しない
xs.get_value()
# array([1, 1, 1])

# 同じ実体を共有したい場合は borrow=True
xs = theano.shared(x, borrow=True)
xs.get_value()
# array([1, 2, 1])

続いて、

self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)

# symbolic description of how to compute prediction as class whose
# probability is maximal
self.y_pred = T.argmax(self.p_y_given_x, axis=1)
# end-snippet-1

# parameters of the model
self.params = [self.W, self.b]

# keep track of model input
self.input = input

T.nnet.softmaxで入力と重みの線形和を総和1の確率値に変換する。
*inputはbatch_num×784の行列。Wは784×10の行列
この内積を取るので、入力はbatch_num×10の行列に線形変換される。

ソフトマックス関数の復習

d = theano.shared(value=numpy.matrix([[0.1, 0.2, 0.3],
                                      [0.3, 0.2, 0.1],
                                      [0.1, 0.2, 0.3]],
                                     dtype=theano.config.floatX),
                  　　　　　 　　　    name='d', borrow=True)
sm = T.nnet.softmax(d)
# [[ 0.30060961  0.33222499  0.3671654 ]
#  [ 0.3671654   0.33222499  0.30060961]
#  [ 0.30060961  0.33222499  0.3671654 ]]

T.argmaxで最も確率値が大きいクラスを取る。
先ほどの復習の例題の場合だと、

am = T.argmax(sm,axis=1) ＊axis=1で行ごとに評価
print am.eval()
>> [2,0,2]

となる。

negative_log_likelihood関数

呼び出し元

cost = classifier.negative_log_likelihood(y)

negative_log_likelihood関数

def negative_log_likelihood(self, y):
    return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])

バッチデータ全体の損失値（コスト）を平均する。
正解データと予測値がズレているほど、損失値は大きくなる。
（補足）
これを後から使おうと思ったとき、何をしてるか少し混乱したのでメモ。
この形で使う場合、p_y_given_x,yはそれぞれmatrix, vectorを想定している。
具体的にはこんな感じ。

import numpy as np
p_y_given_x = np.array([[0.4, 0.6], [0.3, 0.7], [0.9, 0.1]])
y = np.array([[1, 0, 1]])

下記にようにyをmatrixに変換してあげ

np.arange(y.shape[0]), y
[[0 1
 [1 0
 [0 1]]

交差エントロピーの平均値を取っている。

p_y_given_x[np.arange(y.shape[0]), y]
[ 0.6  0.3  0.9]

-np.log(p_y_given_x)[np.arange(y.shape[0]), y]
[ 0.51082562  1.2039728   0.10536052]

-np.log(p_y_given_x)[np.arange(y.shape[0]), y].mean()
0.606719647917

test_model

続いて、test_model関数の定義

# compiling a Theano function that computes the mistakes that are made by
# the model on a minibatch
test_model = theano.function(
    inputs=[index],
    outputs=classifier.errors(y),
    givens={
        x: test_set_x[index * batch_size: (index + 1) * batch_size],
        y: test_set_y[index * batch_size: (index + 1) * batch_size]
    }
)

ミニバッチ法なので、入力はindexで分けられている。
注意すべきは、outputs=classifier.errors(y)の部分。
つまり、test_model関数の出力は、classifier.errors(y)の処理結果。

errors関数

def errors(self, y):
    return T.mean(T.neq(self.y_pred, y))

neq関数は二つのlistの各値を比較し、真偽値を返す。（同じならFalse）

y = numpy.array([1, 0, 1, 1, 1])
y_pred = numpy.array([1, 1, 1, 0, 1])
# theano.tensor
print T.neq(y_pred, y).eval()
>> [False  True False  True False]
print T.mean(T.neq(y_pred, y)).eval()
>> 0.4

error関数は、正解データと予測値の不一致率を返す。
→不一致数が多いほど、error関数の値は高くなる。
error関数は、最終的なモデルの評価で用いられる。

* validate_modelも同じ関数定義なので割愛。

T.grad

各パラメータの勾配を計算する。

# compute the gradient of cost with respect to theta = (W,b)
g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)

grad関数の復習
第1引数に微分の対象となる数式、第2引数に微分を取るシンボルを与える。
それぞれ、 cost・wrt (with respect to) というキーワード引数としても指定可能。

x = T.dscalar('x')
y = x ** 2
gy = T.grad(y, x)
f = function([x], gy)
f(4)
f(94.2)
>> 8.0 188.4

今回の場合、cost関数（negative_log_likelihood）に関して、W,bで微分する。

updates

勾配降下法によるパラメータの更新。

# specify how to update the parameters of the model as a list of
# (variable, update expression) pairs.
updates = [(classifier.W, classifier.W - learning_rate * g_W),
           (classifier.b, classifier.b - learning_rate * g_b)]

train_model

train_model関数の定義。

# compiling a Theano function `train_model` that returns the cost, but in
# the same time updates the parameter of the model based on the rules
# defined in `updates`
train_model = theano.function(
    inputs=[index],
    outputs=cost,
    updates=updates,
    givens={
        x: train_set_x[index * batch_size: (index + 1) * batch_size],
        y: train_set_y[index * batch_size: (index + 1) * batch_size]
    }
)

test_model同様ミニバッチ法なので、入力はindexで分けられている。
test_modelと違うのは、outputs,updates
train_modelの出力は、cost(negative_log_likelihood)
また、updatesで勾配降下法によるパラメータの更新を行っている。
train_modelはミニバッチ単位で呼ばれるので、ミニバッチ毎に

現時点のcost(negative_log_likelihood)の計算
cost(negative_log_likelihood)関数のパラメータW,bでの偏微分値を計算
現在のパラメータ-(学習率×偏微分値)でパラメータ更新（cost関数を改善させる）

の処理を行っている。

以降は実際の学習（エポック数分のバッチ繰り返し処理）
Early-Stoppingによる収束判定などがあるが、今回は割愛する。