Sentiment analysis (CNN)③ - 機械学習・自然言語処理の勉強メモ

前回はMLPDropoutまで確認した。
前回まででモデルの定義は説明した。
なんで、新しく層のインスタンスを生成することはない。

今回は残りの部分を確認する。

define parameters

パラメータを定義する。

#define parameters of the model and update functions using adadelta
params = classifier.params     
for conv_layer in conv_layers:
    params += conv_layer.params
if non_static:
    #if word vectors are allowed to change, add them as model parameters
    params += [Words]

* non_staticの場合、単語の分散ベクトルも更新する。

次に、コスト関数、勾配計算関数を定義する。

cost = classifier.negative_log_likelihood(y) 
dropout_cost = classifier.dropout_negative_log_likelihood(y)           
grad_updates = sgd_updates_adadelta(params, dropout_cost, lr_decay, 1e-6, sqr_norm_lim)

新しいのは、sgd_updates_adadelta
あまり深入りせずに少しだけ見ようと思う。
adadeltaは「勾配法の一つ。学習率を自動調整する方法」
数式は追い切れておらず、浅い理解であるが、ポイントは、

初期学習係数を決める必要がない
直近の勾配情報を優先して利用

の２点。
今までは、パラメータのコスト関数の勾配と学習率での更新であったが、今回はadadelta勾配法でパラメータを更新するようにしている。

次に気になるのが149行目～

for conv_layer in conv_layers:
    test_layer0_output = conv_layer.predict(test_layer0_input, test_size)
    test_pred_layers.append(test_layer0_output.flatten(2))

conv_layersは高さ3,4,5つ3つの異なるCNN層（インスタンス）
それを一つずつ取り出して、各層のpredict関数を呼んでいる。

このpredict関数は以下の通り

def predict(self, new_data, batch_size):
    """
    predict for new data
    """
    img_shape = (batch_size, 1, self.image_shape[2], self.image_shape[3])
    conv_out = conv.conv2d(input=new_data, filters=self.W, filter_shape=self.filter_shape, image_shape=img_shape)
    if self.non_linear=="tanh":
        conv_out_tanh = T.tanh(conv_out + self.b.dimshuffle('x', 0, 'x', 'x'))
        output = downsample.max_pool_2d(input=conv_out_tanh, ds=self.poolsize, ignore_border=True)
    if self.non_linear=="relu":
        conv_out_tanh = ReLU(conv_out + self.b.dimshuffle('x', 0, 'x', 'x'))
        output = downsample.max_pool_2d(input=conv_out_tanh, ds=self.poolsize, ignore_border=True)
    else:
        pooled_out = downsample.max_pool_2d(input=conv_out, ds=self.poolsize, ignore_border=True)
        output = pooled_out + self.b.dimshuffle('x', 0, 'x', 'x')
    return output

テストデータを入力として、元に畳み込み＆ダウンサンプリングを行っている。
ダウンサンプリング後の特徴マップを呼び出し元に返す。
それを呼び出し元で結合しており、繰り返し後は一番右の結合結果がtest_pred_layersに追加される。
次の

test_layer1_input = T.concatenate(test_pred_layers, 1)

で各特徴マップを一つにまとめる。

test_layer1_input の結果は右から２番目の結合後のイメージ（下図の場合は高さ2,3,4）
f:id:kento1109:20171119205721p:plain
引用元: hang, Y., & Wallace, B. (2015). A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification

そして、

test_y_pred = classifier.predict(test_layer1_input)

でtest_layer1_inputを引数に予測値を出力する。

test_error = T.mean(T.neq(test_y_pred, y))
test_model_all = theano.function([x,y], test_error, allow_input_downcast = True)

test_model_all はテストデータx,yを受け取り、test_error（テスト平均エラー率）を返す関数となる。

これ以降はミニバッチでの学習処理となるので、ここでは割愛する。

以上、CNNでの文書分類の流れを大まかではあるがまとめた。

次は自然言語処理では欠かせないRNN, LSTMの実装を勉強する。