TheanoでRNN① - 機械学習・自然言語処理の勉強メモ

下記のdocumentationについて整理する。

Recurrent Neural Networks with Word Embeddings — DeepLearning 0.1 documentation

タスクについて

assigning a label to each word given a sentence. It’s a classification task.

とある通り、文章の各単語のタグ付け問題である。
Datasetのサンプルを見ると、
f:id:kento1109:20171119143632p:plain
IOB表記での固有表現抽出タスクであることが分かる。

全体のソースコードはここにまとめられている。
github.com
笑顔が素敵だ。

elman-forward.py

走らせるのは、elman-forward.py
まずはこのソースを読んでいく。

最初はデータセットの読み込みなので割愛。
まず大事なのはRNNインスタンスの生成部分。

rnn = model(nh = s['nhidden'],  # dimension of the hidden layer
            nc = nclasses,  # number of classes
            ne = vocsize, # number of word embeddings in the vocabulary
            de = s['emb_dimension'], # dimension of the word embeddings
            cs = s['win'] )  # word window context size

is13.rnn.elman

parameters of the model

# parameters of the model
self.emb = theano.shared(0.2 * numpy.random.uniform(-1.0, 1.0,\
           (ne+1, de)).astype(theano.config.floatX)) # add one for PADDING at the end
self.Wx  = theano.shared(0.2 * numpy.random.uniform(-1.0, 1.0,\
           (de * cs, nh)).astype(theano.config.floatX))
self.Wh  = theano.shared(0.2 * numpy.random.uniform(-1.0, 1.0,\
           (nh, nh)).astype(theano.config.floatX))
self.W   = theano.shared(0.2 * numpy.random.uniform(-1.0, 1.0,\
           (nh, nc)).astype(theano.config.floatX))
self.bh  = theano.shared(numpy.zeros(nh, dtype=theano.config.floatX))
self.b   = theano.shared(numpy.zeros(nc, dtype=theano.config.floatX))
self.h0  = theano.shared(numpy.zeros(nh, dtype=theano.config.floatX))

.self.emb：単語の分散表現の重み（分散表現の次元, 語彙数+1）
こんなイメージになる予定。

word	1	2	...
apple	0.2	-0.3	...
orange	-0.4	0.1	...

self.Wx：入力層ー隠れ層間の重み（分散表現の次元×文脈サイズ, 隠れ層のユニット数）
self.Wh：一時刻前の隠れ層ー隠れ層の重み（隠れ層, 隠れ層）
Wx, Whは下記に対応

f:id:kento1109:20171119150730p:plain
cs224dより引用：
http://cs224d.stanford.edu/lectures/CS224d-Lecture8.pdf

self.W：隠れ層ー出力層の重み（隠れ層, クラス数）

bundle
RNNのメイン部分

# bundle
self.params = [ self.emb, self.Wx, self.Wh, self.W, self.bh, self.b, self.h0 ]
self.names  = ['embeddings', 'Wx', 'Wh', 'W', 'bh', 'b', 'h0']
idxs = T.imatrix() # as many columns as context window size/lines as words in the sentence
x = self.emb[idxs].reshape((idxs.shape[0], de*cs))
y    = T.iscalar('y') # label

def recurrence(x_t, h_tm1):
    h_t = T.nnet.sigmoid(T.dot(x_t, self.Wx) + T.dot(h_tm1, self.Wh) + self.bh)
    s_t = T.nnet.softmax(T.dot(h_t, self.W) + self.b)
    return [h_t, s_t]

[h, s], _ = theano.scan(fn=recurrence, \
    sequences=x, outputs_info=[self.h0, None], \
    n_steps=x.shape[0])

xは（入力データの行数, 分散次元数×ウィンドウサイズ）

次に、recurrence関数。
この呼び出しでscanを利用している。
scanはTheanoでの繰り返し処理を担う。

ここでの引数を見てみると、

sequences：Iteration 処理される引数
outputs_info：初期値
n_steps：繰り返し処理を行う回数

※入力データxが一行ずつIteration 処理で渡される。

このscanの関数として、recurrence関数を再帰的に呼び出す。
引数などの詳しいことはここで整理した。
kento1109.hatenablog.com

初回のiterationは下記のイメージ
f:id:kento1109:20171120145043p:plain
※h_tm1はh0(0)なので、通常のフィードフォワードと同じ流れ

２回目以降のiterationは下記のイメージ
f:id:kento1109:20171120174200p:plain
★x_t,h_tm1には、それぞれx（のイテレーション）, h_tが与えられる。

これをn_stepsで与えられたx.shape[0]（入力データの行数）分繰り返す。
その結果、resultとして今までのh_t, s_tがh, sに渡される。

次に、

p_y_given_x_lastword = s[-1,0,:]
p_y_given_x_sentence = s[:,0,:]
y_pred = T.argmax(p_y_given_x_sentence, axis=1)

p_y_given_x_lastword：
s[-1,0,:]なので、センテンスの最後の再帰処理のウィンドウの所属クラスの確率ベクトルを取得する。
p_y_given_x_sentence：
s[:,0,:]なので、全てのウィンドウの所属クラスの確率行列を取得する。
最後にあるウィンドウの確率値が最も高いクラスを取得する。

cost and gradients and learning rate
コスト関数と勾配関数の定義

# cost and gradients and learning rate
lr = T.scalar('lr')
nll = -T.log(p_y_given_x_lastword)[y]
gradients = T.grad( nll, self.params )
updates = OrderedDict(( p, p-lr*g ) for p, g in zip( self.params , gradients))

theano functions
関数定義

# theano functions
self.classify = theano.function(inputs=[idxs], outputs=y_pred)
self.train = theano.function( inputs  = [idxs, y, lr],
                              outputs = nll,
                              updates = updates )

self.normalize = theano.function( inputs = [],
                 updates = {self.emb:\                
                 self.emb/T.sqrt((self.emb**2).sum(axis=1)).dimshuffle(0,'x')})

self.classify：識別関数（入力データからラベルを出力）
self.train：訓練関数
self.normalize ：正規化関数（embを正規化）

とりあえず、①はここまで