word2vec / ゼロから作る Deep Learning 2

    Posted on 2018/09/29

    3章 - word2vec

    ゼロから作る Deep Learning (2) 自然言語処理編の読書メモです。今回は3章の “word2vec” まで読みました。単語の分散表現について CBOW モデルを利用した推論ベースの手法をみていきます。

    カウントベースの手法との違い

    • カウントベースの手法では学習データを一度にまとめて処理する
    • 推論ベースの手法では学習データの一部を使って逐次的に学習を行う
      • データを小分けにして学習できる

    推論ベースの手法

    you ??? goodbye and I say hello

    上の ??? にあたる部分にどのような単語が出現するかを推測する(モデルは各単語の出現確率を出力)

    単語の処理

    • ニューラルネットワークで単語を処理するには固定長のベクトルに変換する必要がある
    • one-hot 表現: ベクトルの要素の中のひとつだけが 1 で残りはすべて 0 であるようなベクトル

    全結合層

    %sh
    pip3 install numpy matplotlib
    Requirement already satisfied (use --upgrade to upgrade): numpy in /usr/local/lib/python3.5/dist-packages
    Collecting matplotlib
      Downloading https://files.pythonhosted.org/packages/7b/ca/8b55a66b7ce426329ab16419a7eee4eb35b5a3fbe0d002434b339a4a7b09/matplotlib-3.0.0-cp35-cp35m-manylinux1_x86_64.whl (12.8MB)
    Collecting cycler>=0.10 (from matplotlib)
      Using cached https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
    Collecting python-dateutil>=2.1 (from matplotlib)
      Using cached https://files.pythonhosted.org/packages/cf/f5/af2b09c957ace60dcfac112b669c45c8c97e32f94aa8b56da4c6d1682825/python_dateutil-2.7.3-py2.py3-none-any.whl
    Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 (from matplotlib)
      Downloading https://files.pythonhosted.org/packages/42/47/e6d51aef3d0393f7d343592d63a73beee2a8d3d69c22b053e252c6cfacd5/pyparsing-2.2.1-py2.py3-none-any.whl (57kB)
    Collecting kiwisolver>=1.0.1 (from matplotlib)
      Downloading https://files.pythonhosted.org/packages/7e/31/d6fedd4fb2c94755cd101191e581af30e1650ccce7a35bddb7930fed6574/kiwisolver-1.0.1-cp35-cp35m-manylinux1_x86_64.whl (949kB)
    Collecting six (from cycler>=0.10->matplotlib)
      Downloading https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
    Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/lib/python3/dist-packages (from kiwisolver>=1.0.1->matplotlib)
    Installing collected packages: six, cycler, python-dateutil, pyparsing, kiwisolver, matplotlib
    Successfully installed cycler-0.10.0 kiwisolver-1.0.1 matplotlib-3.0.0 pyparsing-2.2.1 python-dateutil-2.7.3 six-1.11.0
    You are using pip version 8.1.1, however version 18.0 is available.
    You should consider upgrading via the 'pip install --upgrade pip' command.
    

    %python
    import numpy as np
    
    c = np.array([[1, 0, 0, 0, 0, 0, 0]]) # 入力
    W = np.random.randn(7, 3) # 重み
    h = np.dot(c, W) # 中間ノード
    
    h
    array([[ 0.51548717,  0.69375812, -0.52163008]])
    

    重みWの対応する行が抜き出される

    CBOWモデル: 推論処理

    %python
    class MatMul:
        def __init__(self, W):
            self.params = [W]
            self.grads = [np.zeros_like(W)]
            self.x = None
    
        def forward(self, x):
            W, = self.params
            out = np.dot(x, W)
            self.x = x
            return out
    
        def backward(self, dout):
            W, = self.params
            dx = np.dot(dout, W.T)
            dW = np.dot(self.x.T, dout)
            self.grads[0][...] = dW
            return dx

    %python
    import numpy as np
    
    # コンテキストデータ
    c0 = np.array([[1, 0, 0, 0, 0, 0, 0]])
    c1 = np.array([[0, 0, 1, 0, 0, 0, 0]])
    
    # 重みの初期化
    W_in = np.random.randn(7, 3)
    W_out = np.random.randn(3, 7)
    
    # レイヤの生成
    in_layer0 = MatMul(W_in)
    in_layer1 = MatMul(W_in)
    out_layer = MatMul(W_out)
    
    # 順伝搬
    h0 = in_layer0.forward(c0)
    h1 = in_layer1.forward(c1)
    h = 0.5 * (h0 + h1)
    s = out_layer.forward(h)
    
    s
    array([[-2.89929998, -2.361709  ,  3.07450532,  1.05383403,  3.11066845,
            -0.50834708, -2.51717838]])
    

    CBOW モデル: 学習

    • 上記の実装では出力層で各単語のスコアを出力している
    • このスコアに対して Softmax 関数を適用すると確率を得ることが出来る
    • 得られた確率の交差エントロピー誤差を損失値として扱う(TODO: ここよく分かってない)
    • CBOW モデルはコーパスにおける単語の出現パターンを学ぶだけ
      • コーパスが違えば得られる単語の分散表現も異なってくる

    %python
    # 単語にIDを割り振る
    def preprocess(text):
        text = text.lower()
        text = text.replace('.', ' .')
        words = text.split(' ')
    
        word_to_id = {}
        id_to_word = {}
        for word in words:
            if word not in word_to_id:
                new_id = len(word_to_id)
                word_to_id[word] = new_id
                id_to_word[new_id] = word
    
        corpus = np.array([word_to_id[w] for w in words])
    
        return corpus, word_to_id, id_to_word

    %python
    text = 'You say goodbye and I say hello.'
    corpus, word_to_id, id_to_word = preprocess(text)
    
    print('corpus: ', corpus)
    print('id_to_word: ', id_to_word)
    corpus:  [0 1 2 3 4 1 5 6]
    id_to_word:  {0: 'you', 1: 'say', 2: 'goodbye', 3: 'and', 4: 'i', 5: 'hello', 6: '.'}
    

    %python
    def create_contexts_target(corpus, window_size=1):
        '''one-hot表現への変換を行う
        :param words: 単語IDのNumPy配列
        :param vocab_size: 語彙数
        :return: one-hot表現に変換後のNumPy配列
        '''
        target = corpus[window_size:-window_size]
        contexts = []
    
        for idx in range(window_size, len(corpus)-window_size):
            cs = []
            for t in range(-window_size, window_size + 1):
                if t == 0:
                    continue
                cs.append(corpus[idx + t])
            contexts.append(cs)
    
        return np.array(contexts), np.array(target)

    %python
    contexts, target = create_contexts_target(corpus, window_size=1)
    
    print('contexts: ', contexts)
    
    print('target: ', target)
    contexts:  [[0 2]
     [1 3]
     [2 4]
     [3 1]
     [4 5]
     [1 6]]
    target:  [1 2 3 4 1 5]
    

    %python
    def convert_one_hot(corpus, vocab_size):
        '''one-hot表現への変換
        :param corpus: 単語IDのリスト(1次元もしくは2次元のNumPy配列)
        :param vocab_size: 語彙数
        :return: one-hot表現(2次元もしくは3次元のNumPy配列)
        '''
        N = corpus.shape[0]
    
        if corpus.ndim == 1:
            one_hot = np.zeros((N, vocab_size), dtype=np.int32)
            for idx, word_id in enumerate(corpus):
                one_hot[idx, word_id] = 1
    
        elif corpus.ndim == 2:
            C = corpus.shape[1]
            one_hot = np.zeros((N, C, vocab_size), dtype=np.int32)
            for idx_0, word_ids in enumerate(corpus):
                for idx_1, word_id in enumerate(word_ids):
                    one_hot[idx_0, idx_1, word_id] = 1
    
        return one_hot

    %python
    vocab_size = len(word_to_id)
    target = convert_one_hot(target, vocab_size)
    contexts = convert_one_hot(contexts, vocab_size)

    CBOW モデル: 実装

    %python
    def softmax(x):
        if x.ndim == 2:
            x = x - x.max(axis=1, keepdims=True)
            x = np.exp(x)
            x /= x.sum(axis=1, keepdims=True)
        elif x.ndim == 1:
            x = x - np.max(x)
            x = np.exp(x) / np.sum(np.exp(x))
    
        return x
    
    def cross_entropy_error(y, t):
        if y.ndim == 1:
            t = t.reshape(1, t.size)
            y = y.reshape(1, y.size)
            
        # 教師データがone-hot-vectorの場合、正解ラベルのインデックスに変換
        if t.size == y.size:
            t = t.argmax(axis=1)
                 
        batch_size = y.shape[0]
    
        return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size
    
    class SoftmaxWithLoss:
        def __init__(self):
            self.params, self.grads = [], []
            self.y = None  # softmaxの出力
            self.t = None  # 教師ラベル
    
        def forward(self, x, t):
            self.t = t
            self.y = softmax(x)
    
            # 教師ラベルがone-hotベクトルの場合、正解のインデックスに変換
            if self.t.size == self.y.size:
                self.t = self.t.argmax(axis=1)
    
            loss = cross_entropy_error(self.y, self.t)
            return loss
    
        def backward(self, dout=1):
            batch_size = self.t.shape[0]
    
            dx = self.y.copy()
            dx[np.arange(batch_size), self.t] -= 1
            dx *= dout
            dx = dx / batch_size
    
            return dx

    %python
    class SimpleCBOW:
        # 初期化
        def __init__(self, vocab_size, hidden_size):
            V, H = vocab_size, hidden_size
            
            # 重みの初期化
            W_in = 0.01 * np.random.randn(V, H).astype('f')
            W_out = 0.01 * np.random.randn(H, V).astype('f')
            
            # レイヤの生成
            self.in_layer0 = MatMul(W_in)
            self.in_layer1 = MatMul(W_in)
            self.out_layer = MatMul(W_out)
            self.loss_layer = SoftmaxWithLoss()
            
            # 
            layers = [self.in_layer0, self.in_layer1, self.out_layer]
            self.params, self.grads = [], []
            for layer in layers:
                self.params += layer.params
                self.grads += layer.grads
            
            # 単語の分散表現
            self.word_vecs = W_in
        
        # 順伝搬
        def forward(self, contexts, target):
            h0 = self.in_layer0.forward(contexts[:, 0])
            h1 = self.in_layer1.forward(contexts[:, 1])
            h = (h0 + h1) * 0.5
            
            score = self.out_layer.forward(h)
            loss = self.loss_layer.forward(score, target)
            return loss
        
        # 逆伝搬
        def backward(self, dout=1):
            ds = self.loss_layer.backward(dout)
            da = self.out_layer.backward(ds)
            da *= 0.5
            self.in_layer0.backward(da)
            self.in_layer1.backward(da)
            return None

    Trainer

    %python
    import numpy
    import time
    import matplotlib.pyplot as plt
    
    def clip_grads(grads, max_norm):
        total_norm = 0
        for grad in grads:
            total_norm += np.sum(grad ** 2)
        total_norm = np.sqrt(total_norm)
    
        rate = max_norm / (total_norm + 1e-6)
        if rate < 1:
            for grad in grads:
                grad *= rate
    
    def remove_duplicate(params, grads):
        '''
        パラメータ配列中の重複する重みをひとつに集約し、
        その重みに対応する勾配を加算する
        '''
        params, grads = params[:], grads[:]  # copy list
    
        while True:
            find_flg = False
            L = len(params)
    
            for i in range(0, L - 1):
                for j in range(i + 1, L):
                    # 重みを共有する場合
                    if params[i] is params[j]:
                        grads[i] += grads[j]  # 勾配の加算
                        find_flg = True
                        params.pop(j)
                        grads.pop(j)
                    # 転置行列として重みを共有する場合(weight tying)
                    elif params[i].ndim == 2 and params[j].ndim == 2 and \
                         params[i].T.shape == params[j].shape and np.all(params[i].T == params[j]):
                        grads[i] += grads[j].T
                        find_flg = True
                        params.pop(j)
                        grads.pop(j)
    
                    if find_flg: break
                if find_flg: break
    
            if not find_flg: break
    
        return params, grads
    
    class Trainer:
        def __init__(self, model, optimizer):
            self.model = model
            self.optimizer = optimizer
            self.loss_list = []
            self.eval_interval = None
            self.current_epoch = 0
    
        def fit(self, x, t, max_epoch=10, batch_size=32, max_grad=None, eval_interval=20):
            data_size = len(x)
            max_iters = data_size // batch_size
            self.eval_interval = eval_interval
            model, optimizer = self.model, self.optimizer
            total_loss = 0
            loss_count = 0
    
            start_time = time.time()
            for epoch in range(max_epoch):
                # シャッフル
                idx = numpy.random.permutation(numpy.arange(data_size))
                x = x[idx]
                t = t[idx]
    
                for iters in range(max_iters):
                    batch_x = x[iters*batch_size:(iters+1)*batch_size]
                    batch_t = t[iters*batch_size:(iters+1)*batch_size]
    
                    # 勾配を求め、パラメータを更新
                    loss = model.forward(batch_x, batch_t)
                    model.backward()
                    params, grads = remove_duplicate(model.params, model.grads)  # 共有された重みを1つに集約
                    if max_grad is not None:
                        clip_grads(grads, max_grad)
                    optimizer.update(params, grads)
                    total_loss += loss
                    loss_count += 1
    
                    # 評価
                    if (eval_interval is not None) and (iters % eval_interval) == 0:
                        avg_loss = total_loss / loss_count
                        elapsed_time = time.time() - start_time
                        print('| epoch %d |  iter %d / %d | time %d[s] | loss %.2f'
                              % (self.current_epoch + 1, iters + 1, max_iters, elapsed_time, avg_loss))
                        self.loss_list.append(float(avg_loss))
                        total_loss, loss_count = 0, 0
    
                self.current_epoch += 1
    
        def plot(self, ylim=None):
            x = numpy.arange(len(self.loss_list))
            if ylim is not None:
                plt.ylim(*ylim)
            plt.plot(x, self.loss_list, label='train')
            plt.xlabel('iterations (x' + str(self.eval_interval) + ')')
            plt.ylabel('loss')
            plt.show()
    

    Optimizer

    %python
    class Adam:
        '''
        Adam (http://arxiv.org/abs/1412.6980v8)
        '''
        def __init__(self, lr=0.001, beta1=0.9, beta2=0.999):
            self.lr = lr
            self.beta1 = beta1
            self.beta2 = beta2
            self.iter = 0
            self.m = None
            self.v = None
            
        def update(self, params, grads):
            if self.m is None:
                self.m, self.v = [], []
                for param in params:
                    self.m.append(np.zeros_like(param))
                    self.v.append(np.zeros_like(param))
            
            self.iter += 1
            lr_t = self.lr * np.sqrt(1.0 - self.beta2**self.iter) / (1.0 - self.beta1**self.iter)
    
            for i in range(len(params)):
                self.m[i] += (1 - self.beta1) * (grads[i] - self.m[i])
                self.v[i] += (1 - self.beta2) * (grads[i]**2 - self.v[i])
                
                params[i] -= lr_t * self.m[i] / (np.sqrt(self.v[i]) + 1e-7)

    トレーニング実行

    %python
    window_size = 1
    hidden_size = 5
    batch_size = 3
    max_epoch = 1000
    
    model = SimpleCBOW(vocab_size, hidden_size)
    optimizer = Adam()
    trainer = Trainer(model, optimizer)
    
    trainer.fit(contexts, target, max_epoch, batch_size)
    | epoch 1 |  iter 1 / 2 | time 0[s] | loss 1.95
    | epoch 2 |  iter 1 / 2 | time 0[s] | loss 1.95
    | epoch 3 |  iter 1 / 2 | time 0[s] | loss 1.95
    | epoch 4 |  iter 1 / 2 | time 0[s] | loss 1.95
    | epoch 5 |  iter 1 / 2 | time 0[s] | loss 1.95
    | epoch 6 |  iter 1 / 2 | time 0[s] | loss 1.95
    | epoch 7 |  iter 1 / 2 | time 0[s] | loss 1.95
    | epoch 8 |  iter 1 / 2 | time 0[s] | loss 1.95
    | epoch 9 |  iter 1 / 2 | time 0[s] | loss 1.95
    | epoch 10 |  iter 1 / 2 | time 0[s] | loss 1.95
    | epoch 11 |  iter 1 / 2 | time 0[s] | loss 1.95
    | epoch 12 |  iter 1 / 2 | time 0[s] | loss 1.95
    | epoch 13 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 14 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 15 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 16 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 17 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 18 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 19 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 20 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 21 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 22 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 23 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 24 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 25 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 26 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 27 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 28 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 29 |  iter 1 / 2 | time 0[s] | loss 1.94
    | epoch 30 |  iter 1 / 2 | time 0[s] | loss 1.93
    | epoch 31 |  iter 1 / 2 | time 0[s] | loss 1.93
    | epoch 32 |  iter 1 / 2 | time 0[s] | loss 1.93
    | epoch 33 |  iter 1 / 2 | time 0[s] | loss 1.93
    | epoch 34 |  iter 1 / 2 | time 0[s] | loss 1.93
    | epoch 35 |  iter 1 / 2 | time 0[s] | loss 1.93
    | epoch 36 |  iter 1 / 2 | time 0[s] | loss 1.93
    | epoch 37 |  iter 1 / 2 | time 0[s] | loss 1.93
    | epoch 38 |  iter 1 / 2 | time 0[s] | loss 1.93
    | epoch 39 |  iter 1 / 2 | time 0[s] | loss 1.92
    | epoch 40 |  iter 1 / 2 | time 0[s] | loss 1.92
    | epoch 41 |  iter 1 / 2 | time 0[s] | loss 1.92
    | epoch 42 |  iter 1 / 2 | time 0[s] | loss 1.92
    | epoch 43 |  iter 1 / 2 | time 0[s] | loss 1.91
    | epoch 44 |  iter 1 / 2 | time 0[s] | loss 1.92
    | epoch 45 |  iter 1 / 2 | time 0[s] | loss 1.91
    | epoch 46 |  iter 1 / 2 | time 0[s] | loss 1.92
    | epoch 47 |  iter 1 / 2 | time 0[s] | loss 1.91
    | epoch 48 |  iter 1 / 2 | time 0[s] | loss 1.90
    | epoch 49 |  iter 1 / 2 | time 0[s] | loss 1.91
    | epoch 50 |  iter 1 / 2 | time 0[s] | loss 1.90
    | epoch 51 |  iter 1 / 2 | time 0[s] | loss 1.90
    | epoch 52 |  iter 1 / 2 | time 0[s] | loss 1.91
    | epoch 53 |  iter 1 / 2 | time 0[s] | loss 1.89
    | epoch 54 |  iter 1 / 2 | time 0[s] | loss 1.90
    | epoch 55 |  iter 1 / 2 | time 0[s] | loss 1.89
    | epoch 56 |  iter 1 / 2 | time 0[s] | loss 1.88
    | epoch 57 |  iter 1 / 2 | time 0[s] | loss 1.89
    | epoch 58 |  iter 1 / 2 | time 0[s] | loss 1.88
    | epoch 59 |  iter 1 / 2 | time 0[s] | loss 1.89
    | epoch 60 |  iter 1 / 2 | time 0[s] | loss 1.88
    | epoch 61 |  iter 1 / 2 | time 0[s] | loss 1.87
    | epoch 62 |  iter 1 / 2 | time 0[s] | loss 1.88
    | epoch 63 |  iter 1 / 2 | time 0[s] | loss 1.86
    | epoch 64 |  iter 1 / 2 | time 0[s] | loss 1.87
    | epoch 65 |  iter 1 / 2 | time 0[s] | loss 1.85
    | epoch 66 |  iter 1 / 2 | time 0[s] | loss 1.86
    | epoch 67 |  iter 1 / 2 | time 0[s] | loss 1.86
    | epoch 68 |  iter 1 / 2 | time 0[s] | loss 1.85
    | epoch 69 |  iter 1 / 2 | time 0[s] | loss 1.85
    | epoch 70 |  iter 1 / 2 | time 0[s] | loss 1.85
    | epoch 71 |  iter 1 / 2 | time 0[s] | loss 1.83
    | epoch 72 |  iter 1 / 2 | time 0[s] | loss 1.84
    | epoch 73 |  iter 1 / 2 | time 0[s] | loss 1.83
    | epoch 74 |  iter 1 / 2 | time 0[s] | loss 1.83
    | epoch 75 |  iter 1 / 2 | time 0[s] | loss 1.82
    | epoch 76 |  iter 1 / 2 | time 0[s] | loss 1.83
    | epoch 77 |  iter 1 / 2 | time 0[s] | loss 1.80
    | epoch 78 |  iter 1 / 2 | time 0[s] | loss 1.83
    | epoch 79 |  iter 1 / 2 | time 0[s] | loss 1.80
    | epoch 80 |  iter 1 / 2 | time 0[s] | loss 1.82
    | epoch 81 |  iter 1 / 2 | time 0[s] | loss 1.80
    | epoch 82 |  iter 1 / 2 | time 0[s] | loss 1.79
    | epoch 83 |  iter 1 / 2 | time 0[s] | loss 1.79
    | epoch 84 |  iter 1 / 2 | time 0[s] | loss 1.80
    | epoch 85 |  iter 1 / 2 | time 0[s] | loss 1.78
    | epoch 86 |  iter 1 / 2 | time 0[s] | loss 1.80
    | epoch 87 |  iter 1 / 2 | time 0[s] | loss 1.79
    | epoch 88 |  iter 1 / 2 | time 0[s] | loss 1.76
    | epoch 89 |  iter 1 / 2 | time 0[s] | loss 1.78
    | epoch 90 |  iter 1 / 2 | time 0[s] | loss 1.76
    | epoch 91 |  iter 1 / 2 | time 0[s] | loss 1.76
    | epoch 92 |  iter 1 / 2 | time 0[s] | loss 1.76
    | epoch 93 |  iter 1 / 2 | time 0[s] | loss 1.76
    | epoch 94 |  iter 1 / 2 | time 0[s] | loss 1.75
    | epoch 95 |  iter 1 / 2 | time 0[s] | loss 1.73
    | epoch 96 |  iter 1 / 2 | time 0[s] | loss 1.74
    | epoch 97 |  iter 1 / 2 | time 0[s] | loss 1.73
    | epoch 98 |  iter 1 / 2 | time 0[s] | loss 1.75
    | epoch 99 |  iter 1 / 2 | time 0[s] | loss 1.71
    | epoch 100 |  iter 1 / 2 | time 0[s] | loss 1.75
    | epoch 101 |  iter 1 / 2 | time 0[s] | loss 1.70
    | epoch 102 |  iter 1 / 2 | time 0[s] | loss 1.71
    | epoch 103 |  iter 1 / 2 | time 0[s] | loss 1.71
    | epoch 104 |  iter 1 / 2 | time 0[s] | loss 1.70
    | epoch 105 |  iter 1 / 2 | time 0[s] | loss 1.70
    | epoch 106 |  iter 1 / 2 | time 0[s] | loss 1.68
    | epoch 107 |  iter 1 / 2 | time 0[s] | loss 1.70
    | epoch 108 |  iter 1 / 2 | time 0[s] | loss 1.69
    | epoch 109 |  iter 1 / 2 | time 0[s] | loss 1.68
    | epoch 110 |  iter 1 / 2 | time 0[s] | loss 1.67
    | epoch 111 |  iter 1 / 2 | time 0[s] | loss 1.70
    | epoch 112 |  iter 1 / 2 | time 0[s] | loss 1.63
    | epoch 113 |  iter 1 / 2 | time 0[s] | loss 1.67
    | epoch 114 |  iter 1 / 2 | time 0[s] | loss 1.68
    | epoch 115 |  iter 1 / 2 | time 0[s] | loss 1.64
    | epoch 116 |  iter 1 / 2 | time 0[s] | loss 1.64
    | epoch 117 |  iter 1 / 2 | time 0[s] | loss 1.65
    | epoch 118 |  iter 1 / 2 | time 0[s] | loss 1.61
    | epoch 119 |  iter 1 / 2 | time 0[s] | loss 1.63
    | epoch 120 |  iter 1 / 2 | time 0[s] | loss 1.65
    | epoch 121 |  iter 1 / 2 | time 0[s] | loss 1.62
    | epoch 122 |  iter 1 / 2 | time 0[s] | loss 1.62
    | epoch 123 |  iter 1 / 2 | time 0[s] | loss 1.61
    | epoch 124 |  iter 1 / 2 | time 0[s] | loss 1.58
    | epoch 125 |  iter 1 / 2 | time 0[s] | loss 1.63
    | epoch 126 |  iter 1 / 2 | time 0[s] | loss 1.60
    | epoch 127 |  iter 1 / 2 | time 0[s] | loss 1.58
    | epoch 128 |  iter 1 / 2 | time 0[s] | loss 1.59
    | epoch 129 |  iter 1 / 2 | time 0[s] | loss 1.59
    | epoch 130 |  iter 1 / 2 | time 0[s] | loss 1.58
    | epoch 131 |  iter 1 / 2 | time 0[s] | loss 1.57
    | epoch 132 |  iter 1 / 2 | time 0[s] | loss 1.58
    | epoch 133 |  iter 1 / 2 | time 0[s] | loss 1.55
    | epoch 134 |  iter 1 / 2 | time 0[s] | loss 1.57
    | epoch 135 |  iter 1 / 2 | time 0[s] | loss 1.51
    | epoch 136 |  iter 1 / 2 | time 0[s] | loss 1.60
    | epoch 137 |  iter 1 / 2 | time 0[s] | loss 1.51
    | epoch 138 |  iter 1 / 2 | time 0[s] | loss 1.53
    | epoch 139 |  iter 1 / 2 | time 0[s] | loss 1.54
    | epoch 140 |  iter 1 / 2 | time 0[s] | loss 1.52
    | epoch 141 |  iter 1 / 2 | time 0[s] | loss 1.53
    | epoch 142 |  iter 1 / 2 | time 0[s] | loss 1.54
    | epoch 143 |  iter 1 / 2 | time 0[s] | loss 1.52
    | epoch 144 |  iter 1 / 2 | time 0[s] | loss 1.46
    | epoch 145 |  iter 1 / 2 | time 0[s] | loss 1.51
    | epoch 146 |  iter 1 / 2 | time 0[s] | loss 1.50
    | epoch 147 |  iter 1 / 2 | time 0[s] | loss 1.49
    | epoch 148 |  iter 1 / 2 | time 0[s] | loss 1.49
    | epoch 149 |  iter 1 / 2 | time 0[s] | loss 1.45
    | epoch 150 |  iter 1 / 2 | time 0[s] | loss 1.51
    | epoch 151 |  iter 1 / 2 | time 0[s] | loss 1.48
    | epoch 152 |  iter 1 / 2 | time 0[s] | loss 1.46
    | epoch 153 |  iter 1 / 2 | time 0[s] | loss 1.45
    | epoch 154 |  iter 1 / 2 | time 0[s] | loss 1.46
    | epoch 155 |  iter 1 / 2 | time 0[s] | loss 1.46
    | epoch 156 |  iter 1 / 2 | time 0[s] | loss 1.46
    | epoch 157 |  iter 1 / 2 | time 0[s] | loss 1.44
    | epoch 158 |  iter 1 / 2 | time 0[s] | loss 1.43
    | epoch 159 |  iter 1 / 2 | time 0[s] | loss 1.43
    | epoch 160 |  iter 1 / 2 | time 0[s] | loss 1.43
    | epoch 161 |  iter 1 / 2 | time 0[s] | loss 1.39
    | epoch 162 |  iter 1 / 2 | time 0[s] | loss 1.48
    | epoch 163 |  iter 1 / 2 | time 0[s] | loss 1.34
    | epoch 164 |  iter 1 / 2 | time 0[s] | loss 1.41
    | epoch 165 |  iter 1 / 2 | time 0[s] | loss 1.41
    | epoch 166 |  iter 1 / 2 | time 0[s] | loss 1.49
    | epoch 167 |  iter 1 / 2 | time 0[s] | loss 1.30
    | epoch 168 |  iter 1 / 2 | time 0[s] | loss 1.47
    | epoch 169 |  iter 1 / 2 | time 0[s] | loss 1.40
    | epoch 170 |  iter 1 / 2 | time 0[s] | loss 1.33
    | epoch 171 |  iter 1 / 2 | time 0[s] | loss 1.43
    | epoch 172 |  iter 1 / 2 | time 0[s] | loss 1.28
    | epoch 173 |  iter 1 / 2 | time 0[s] | loss 1.46
    | epoch 174 |  iter 1 / 2 | time 0[s] | loss 1.31
    | epoch 175 |  iter 1 / 2 | time 0[s] | loss 1.36
    | epoch 176 |  iter 1 / 2 | time 0[s] | loss 1.36
    | epoch 177 |  iter 1 / 2 | time 0[s] | loss 1.41
    | epoch 178 |  iter 1 / 2 | time 0[s] | loss 1.29
    | epoch 179 |  iter 1 / 2 | time 0[s] | loss 1.35
    | epoch 180 |  iter 1 / 2 | time 0[s] | loss 1.33
    | epoch 181 |  iter 1 / 2 | time 0[s] | loss 1.35
    | epoch 182 |  iter 1 / 2 | time 0[s] | loss 1.33
    | epoch 183 |  iter 1 / 2 | time 0[s] | loss 1.36
    | epoch 184 |  iter 1 / 2 | time 0[s] | loss 1.23
    | epoch 185 |  iter 1 / 2 | time 0[s] | loss 1.38
    | epoch 186 |  iter 1 / 2 | time 0[s] | loss 1.30
    | epoch 187 |  iter 1 / 2 | time 0[s] | loss 1.33
    | epoch 188 |  iter 1 / 2 | time 0[s] | loss 1.35
    | epoch 189 |  iter 1 / 2 | time 0[s] | loss 1.30
    | epoch 190 |  iter 1 / 2 | time 0[s] | loss 1.19
    | epoch 191 |  iter 1 / 2 | time 0[s] | loss 1.34
    | epoch 192 |  iter 1 / 2 | time 0[s] | loss 1.35
    | epoch 193 |  iter 1 / 2 | time 0[s] | loss 1.17
    | epoch 194 |  iter 1 / 2 | time 0[s] | loss 1.34
    | epoch 195 |  iter 1 / 2 | time 0[s] | loss 1.28
    | epoch 196 |  iter 1 / 2 | time 0[s] | loss 1.27
    | epoch 197 |  iter 1 / 2 | time 0[s] | loss 1.27
    | epoch 198 |  iter 1 / 2 | time 0[s] | loss 1.21
    | epoch 199 |  iter 1 / 2 | time 0[s] | loss 1.32
    | epoch 200 |  iter 1 / 2 | time 0[s] | loss 1.20
    | epoch 201 |  iter 1 / 2 | time 0[s] | loss 1.32
    | epoch 202 |  iter 1 / 2 | time 0[s] | loss 1.24
    | epoch 203 |  iter 1 / 2 | time 0[s] | loss 1.31
    | epoch 204 |  iter 1 / 2 | time 0[s] | loss 1.24
    | epoch 205 |  iter 1 / 2 | time 0[s] | loss 1.13
    | epoch 206 |  iter 1 / 2 | time 0[s] | loss 1.28
    | epoch 207 |  iter 1 / 2 | time 0[s] | loss 1.18
    | epoch 208 |  iter 1 / 2 | time 0[s] | loss 1.24
    | epoch 209 |  iter 1 / 2 | time 0[s] | loss 1.27
    | epoch 210 |  iter 1 / 2 | time 0[s] | loss 1.16
    | epoch 211 |  iter 1 / 2 | time 0[s] | loss 1.28
    | epoch 212 |  iter 1 / 2 | time 0[s] | loss 1.22
    | epoch 213 |  iter 1 / 2 | time 0[s] | loss 1.21
    | epoch 214 |  iter 1 / 2 | time 0[s] | loss 1.21
    | epoch 215 |  iter 1 / 2 | time 0[s] | loss 1.15
    | epoch 216 |  iter 1 / 2 | time 0[s] | loss 1.26
    | epoch 217 |  iter 1 / 2 | time 0[s] | loss 1.19
    | epoch 218 |  iter 1 / 2 | time 0[s] | loss 1.20
    | epoch 219 |  iter 1 / 2 | time 0[s] | loss 1.19
    | epoch 220 |  iter 1 / 2 | time 0[s] | loss 1.12
    | epoch 221 |  iter 1 / 2 | time 0[s] | loss 1.26
    | epoch 222 |  iter 1 / 2 | time 0[s] | loss 1.18
    | epoch 223 |  iter 1 / 2 | time 0[s] | loss 1.18
    | epoch 224 |  iter 1 / 2 | time 0[s] | loss 1.24
    | epoch 225 |  iter 1 / 2 | time 0[s] | loss 1.04
    | epoch 226 |  iter 1 / 2 | time 0[s] | loss 1.23
    | epoch 227 |  iter 1 / 2 | time 0[s] | loss 1.10
    | epoch 228 |  iter 1 / 2 | time 0[s] | loss 1.30
    | epoch 229 |  iter 1 / 2 | time 0[s] | loss 1.09
    | epoch 230 |  iter 1 / 2 | time 0[s] | loss 1.16
    | epoch 231 |  iter 1 / 2 | time 0[s] | loss 1.15
    | epoch 232 |  iter 1 / 2 | time 0[s] | loss 1.08
    | epoch 233 |  iter 1 / 2 | time 0[s] | loss 1.22
    | epoch 234 |  iter 1 / 2 | time 0[s] | loss 1.14
    | epoch 235 |  iter 1 / 2 | time 0[s] | loss 1.08
    | epoch 236 |  iter 1 / 2 | time 0[s] | loss 1.14
    | epoch 237 |  iter 1 / 2 | time 0[s] | loss 1.21
    | epoch 238 |  iter 1 / 2 | time 0[s] | loss 1.05
    | epoch 239 |  iter 1 / 2 | time 0[s] | loss 1.27
    | epoch 240 |  iter 1 / 2 | time 0[s] | loss 1.06
    | epoch 241 |  iter 1 / 2 | time 0[s] | loss 1.05
    | epoch 242 |  iter 1 / 2 | time 0[s] | loss 1.27
    | epoch 243 |  iter 1 / 2 | time 0[s] | loss 1.04
    | epoch 244 |  iter 1 / 2 | time 0[s] | loss 1.12
    | epoch 245 |  iter 1 / 2 | time 0[s] | loss 1.11
    | epoch 246 |  iter 1 / 2 | time 0[s] | loss 1.12
    | epoch 247 |  iter 1 / 2 | time 0[s] | loss 1.02
    | epoch 248 |  iter 1 / 2 | time 0[s] | loss 1.18
    | epoch 249 |  iter 1 / 2 | time 0[s] | loss 1.11
    | epoch 250 |  iter 1 / 2 | time 0[s] | loss 1.02
    | epoch 251 |  iter 1 / 2 | time 0[s] | loss 1.17
    | epoch 252 |  iter 1 / 2 | time 0[s] | loss 1.02
    | epoch 253 |  iter 1 / 2 | time 0[s] | loss 1.24
    | epoch 254 |  iter 1 / 2 | time 0[s] | loss 1.00
    | epoch 255 |  iter 1 / 2 | time 0[s] | loss 1.09
    | epoch 256 |  iter 1 / 2 | time 0[s] | loss 1.16
    | epoch 257 |  iter 1 / 2 | time 0[s] | loss 0.99
    | epoch 258 |  iter 1 / 2 | time 0[s] | loss 1.00
    | epoch 259 |  iter 1 / 2 | time 0[s] | loss 1.17
    | epoch 260 |  iter 1 / 2 | time 0[s] | loss 1.07
    | epoch 261 |  iter 1 / 2 | time 0[s] | loss 1.07
    | epoch 262 |  iter 1 / 2 | time 0[s] | loss 1.00
    | epoch 263 |  iter 1 / 2 | time 0[s] | loss 1.21
    | epoch 264 |  iter 1 / 2 | time 0[s] | loss 0.97
    | epoch 265 |  iter 1 / 2 | time 0[s] | loss 1.08
    | epoch 266 |  iter 1 / 2 | time 0[s] | loss 0.98
    | epoch 267 |  iter 1 / 2 | time 0[s] | loss 1.11
    | epoch 268 |  iter 1 / 2 | time 0[s] | loss 1.15
    | epoch 269 |  iter 1 / 2 | time 0[s] | loss 0.98
    | epoch 270 |  iter 1 / 2 | time 0[s] | loss 1.04
    | epoch 271 |  iter 1 / 2 | time 0[s] | loss 1.05
    | epoch 272 |  iter 1 / 2 | time 0[s] | loss 1.04
    | epoch 273 |  iter 1 / 2 | time 0[s] | loss 1.04
    | epoch 274 |  iter 1 / 2 | time 0[s] | loss 1.06
    | epoch 275 |  iter 1 / 2 | time 0[s] | loss 1.04
    | epoch 276 |  iter 1 / 2 | time 0[s] | loss 1.09
    | epoch 277 |  iter 1 / 2 | time 0[s] | loss 0.95
    | epoch 278 |  iter 1 / 2 | time 0[s] | loss 0.96
    | epoch 279 |  iter 1 / 2 | time 0[s] | loss 1.10
    | epoch 280 |  iter 1 / 2 | time 0[s] | loss 1.05
    | epoch 281 |  iter 1 / 2 | time 0[s] | loss 1.02
    | epoch 282 |  iter 1 / 2 | time 0[s] | loss 1.02
    | epoch 283 |  iter 1 / 2 | time 0[s] | loss 0.91
    | epoch 284 |  iter 1 / 2 | time 0[s] | loss 1.13
    | epoch 285 |  iter 1 / 2 | time 0[s] | loss 0.98
    | epoch 286 |  iter 1 / 2 | time 0[s] | loss 1.01
    | epoch 287 |  iter 1 / 2 | time 0[s] | loss 0.93
    | epoch 288 |  iter 1 / 2 | time 0[s] | loss 1.12
    | epoch 289 |  iter 1 / 2 | time 0[s] | loss 0.97
    | epoch 290 |  iter 1 / 2 | time 0[s] | loss 1.03
    | epoch 291 |  iter 1 / 2 | time 0[s] | loss 1.00
    | epoch 292 |  iter 1 / 2 | time 0[s] | loss 0.98
    | epoch 293 |  iter 1 / 2 | time 0[s] | loss 1.12
    | epoch 294 |  iter 1 / 2 | time 0[s] | loss 0.88
    | epoch 295 |  iter 1 / 2 | time 0[s] | loss 1.11
    | epoch 296 |  iter 1 / 2 | time 0[s] | loss 0.89
    | epoch 297 |  iter 1 / 2 | time 0[s] | loss 0.91
    | epoch 298 |  iter 1 / 2 | time 0[s] | loss 1.07
    | epoch 299 |  iter 1 / 2 | time 0[s] | loss 1.09
    | epoch 300 |  iter 1 / 2 | time 0[s] | loss 0.82
    | epoch 301 |  iter 1 / 2 | time 0[s] | loss 1.04
    | epoch 302 |  iter 1 / 2 | time 0[s] | loss 0.98
    | epoch 303 |  iter 1 / 2 | time 0[s] | loss 1.07
    | epoch 304 |  iter 1 / 2 | time 0[s] | loss 0.95
    | epoch 305 |  iter 1 / 2 | time 0[s] | loss 0.92
    | epoch 306 |  iter 1 / 2 | time 0[s] | loss 0.99
    | epoch 307 |  iter 1 / 2 | time 0[s] | loss 0.95
    | epoch 308 |  iter 1 / 2 | time 0[s] | loss 0.88
    | epoch 309 |  iter 1 / 2 | time 0[s] | loss 1.08
    | epoch 310 |  iter 1 / 2 | time 0[s] | loss 1.03
    | epoch 311 |  iter 1 / 2 | time 0[s] | loss 0.85
    | epoch 312 |  iter 1 / 2 | time 0[s] | loss 0.88
    | epoch 313 |  iter 1 / 2 | time 0[s] | loss 1.13
    | epoch 314 |  iter 1 / 2 | time 0[s] | loss 0.90
    | epoch 315 |  iter 1 / 2 | time 0[s] | loss 0.93
    | epoch 316 |  iter 1 / 2 | time 0[s] | loss 1.01
    | epoch 317 |  iter 1 / 2 | time 0[s] | loss 0.87
    | epoch 318 |  iter 1 / 2 | time 0[s] | loss 1.01
    | epoch 319 |  iter 1 / 2 | time 0[s] | loss 0.97
    | epoch 320 |  iter 1 / 2 | time 0[s] | loss 0.95
    | epoch 321 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 322 |  iter 1 / 2 | time 0[s] | loss 1.08
    | epoch 323 |  iter 1 / 2 | time 0[s] | loss 0.92
    | epoch 324 |  iter 1 / 2 | time 0[s] | loss 0.83
    | epoch 325 |  iter 1 / 2 | time 0[s] | loss 0.96
    | epoch 326 |  iter 1 / 2 | time 0[s] | loss 1.11
    | epoch 327 |  iter 1 / 2 | time 0[s] | loss 0.94
    | epoch 328 |  iter 1 / 2 | time 0[s] | loss 0.85
    | epoch 329 |  iter 1 / 2 | time 0[s] | loss 0.88
    | epoch 330 |  iter 1 / 2 | time 0[s] | loss 1.02
    | epoch 331 |  iter 1 / 2 | time 0[s] | loss 0.93
    | epoch 332 |  iter 1 / 2 | time 0[s] | loss 0.90
    | epoch 333 |  iter 1 / 2 | time 0[s] | loss 0.84
    | epoch 334 |  iter 1 / 2 | time 0[s] | loss 0.99
    | epoch 335 |  iter 1 / 2 | time 0[s] | loss 0.95
    | epoch 336 |  iter 1 / 2 | time 0[s] | loss 0.95
    | epoch 337 |  iter 1 / 2 | time 0[s] | loss 0.77
    | epoch 338 |  iter 1 / 2 | time 0[s] | loss 1.04
    | epoch 339 |  iter 1 / 2 | time 0[s] | loss 0.83
    | epoch 340 |  iter 1 / 2 | time 0[s] | loss 0.92
    | epoch 341 |  iter 1 / 2 | time 0[s] | loss 0.91
    | epoch 342 |  iter 1 / 2 | time 0[s] | loss 1.00
    | epoch 343 |  iter 1 / 2 | time 0[s] | loss 0.83
    | epoch 344 |  iter 1 / 2 | time 0[s] | loss 1.00
    | epoch 345 |  iter 1 / 2 | time 0[s] | loss 0.91
    | epoch 346 |  iter 1 / 2 | time 0[s] | loss 0.91
    | epoch 347 |  iter 1 / 2 | time 0[s] | loss 0.94
    | epoch 348 |  iter 1 / 2 | time 0[s] | loss 0.99
    | epoch 349 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 350 |  iter 1 / 2 | time 0[s] | loss 0.84
    | epoch 351 |  iter 1 / 2 | time 0[s] | loss 1.05
    | epoch 352 |  iter 1 / 2 | time 0[s] | loss 0.83
    | epoch 353 |  iter 1 / 2 | time 0[s] | loss 0.93
    | epoch 354 |  iter 1 / 2 | time 0[s] | loss 0.84
    | epoch 355 |  iter 1 / 2 | time 0[s] | loss 0.82
    | epoch 356 |  iter 1 / 2 | time 0[s] | loss 1.05
    | epoch 357 |  iter 1 / 2 | time 0[s] | loss 0.80
    | epoch 358 |  iter 1 / 2 | time 0[s] | loss 0.94
    | epoch 359 |  iter 1 / 2 | time 0[s] | loss 0.76
    | epoch 360 |  iter 1 / 2 | time 0[s] | loss 0.92
    | epoch 361 |  iter 1 / 2 | time 0[s] | loss 0.97
    | epoch 362 |  iter 1 / 2 | time 0[s] | loss 0.85
    | epoch 363 |  iter 1 / 2 | time 0[s] | loss 0.75
    | epoch 364 |  iter 1 / 2 | time 0[s] | loss 1.02
    | epoch 365 |  iter 1 / 2 | time 0[s] | loss 0.70
    | epoch 366 |  iter 1 / 2 | time 0[s] | loss 1.09
    | epoch 367 |  iter 1 / 2 | time 0[s] | loss 0.75
    | epoch 368 |  iter 1 / 2 | time 0[s] | loss 0.95
    | epoch 369 |  iter 1 / 2 | time 0[s] | loss 0.83
    | epoch 370 |  iter 1 / 2 | time 0[s] | loss 0.79
    | epoch 371 |  iter 1 / 2 | time 0[s] | loss 1.00
    | epoch 372 |  iter 1 / 2 | time 0[s] | loss 0.83
    | epoch 373 |  iter 1 / 2 | time 0[s] | loss 0.95
    | epoch 374 |  iter 1 / 2 | time 0[s] | loss 0.79
    | epoch 375 |  iter 1 / 2 | time 0[s] | loss 0.82
    | epoch 376 |  iter 1 / 2 | time 0[s] | loss 0.95
    | epoch 377 |  iter 1 / 2 | time 0[s] | loss 0.85
    | epoch 378 |  iter 1 / 2 | time 0[s] | loss 0.82
    | epoch 379 |  iter 1 / 2 | time 0[s] | loss 0.82
    | epoch 380 |  iter 1 / 2 | time 0[s] | loss 0.94
    | epoch 381 |  iter 1 / 2 | time 0[s] | loss 0.73
    | epoch 382 |  iter 1 / 2 | time 0[s] | loss 0.94
    | epoch 383 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 384 |  iter 1 / 2 | time 0[s] | loss 1.03
    | epoch 385 |  iter 1 / 2 | time 0[s] | loss 0.73
    | epoch 386 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 387 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 388 |  iter 1 / 2 | time 0[s] | loss 0.98
    | epoch 389 |  iter 1 / 2 | time 0[s] | loss 0.80
    | epoch 390 |  iter 1 / 2 | time 0[s] | loss 0.86
    | epoch 391 |  iter 1 / 2 | time 0[s] | loss 0.88
    | epoch 392 |  iter 1 / 2 | time 0[s] | loss 0.72
    | epoch 393 |  iter 1 / 2 | time 0[s] | loss 0.97
    | epoch 394 |  iter 1 / 2 | time 0[s] | loss 0.84
    | epoch 395 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 396 |  iter 1 / 2 | time 0[s] | loss 1.02
    | epoch 397 |  iter 1 / 2 | time 0[s] | loss 0.84
    | epoch 398 |  iter 1 / 2 | time 0[s] | loss 0.71
    | epoch 399 |  iter 1 / 2 | time 0[s] | loss 1.00
    | epoch 400 |  iter 1 / 2 | time 0[s] | loss 0.71
    | epoch 401 |  iter 1 / 2 | time 0[s] | loss 0.96
    | epoch 402 |  iter 1 / 2 | time 0[s] | loss 0.62
    | epoch 403 |  iter 1 / 2 | time 0[s] | loss 1.01
    | epoch 404 |  iter 1 / 2 | time 0[s] | loss 0.78
    | epoch 405 |  iter 1 / 2 | time 0[s] | loss 0.79
    | epoch 406 |  iter 1 / 2 | time 0[s] | loss 0.92
    | epoch 407 |  iter 1 / 2 | time 0[s] | loss 0.83
    | epoch 408 |  iter 1 / 2 | time 0[s] | loss 0.86
    | epoch 409 |  iter 1 / 2 | time 0[s] | loss 0.83
    | epoch 410 |  iter 1 / 2 | time 0[s] | loss 0.79
    | epoch 411 |  iter 1 / 2 | time 0[s] | loss 0.82
    | epoch 412 |  iter 1 / 2 | time 0[s] | loss 0.86
    | epoch 413 |  iter 1 / 2 | time 0[s] | loss 0.82
    | epoch 414 |  iter 1 / 2 | time 0[s] | loss 0.70
    | epoch 415 |  iter 1 / 2 | time 0[s] | loss 0.78
    | epoch 416 |  iter 1 / 2 | time 0[s] | loss 0.86
    | epoch 417 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 418 |  iter 1 / 2 | time 0[s] | loss 0.87
    | epoch 419 |  iter 1 / 2 | time 0[s] | loss 0.89
    | epoch 420 |  iter 1 / 2 | time 0[s] | loss 0.68
    | epoch 421 |  iter 1 / 2 | time 0[s] | loss 0.94
    | epoch 422 |  iter 1 / 2 | time 0[s] | loss 0.72
    | epoch 423 |  iter 1 / 2 | time 0[s] | loss 0.87
    | epoch 424 |  iter 1 / 2 | time 0[s] | loss 0.71
    | epoch 425 |  iter 1 / 2 | time 0[s] | loss 0.82
    | epoch 426 |  iter 1 / 2 | time 0[s] | loss 0.79
    | epoch 427 |  iter 1 / 2 | time 0[s] | loss 0.78
    | epoch 428 |  iter 1 / 2 | time 0[s] | loss 0.89
    | epoch 429 |  iter 1 / 2 | time 0[s] | loss 0.74
    | epoch 430 |  iter 1 / 2 | time 0[s] | loss 0.80
    | epoch 431 |  iter 1 / 2 | time 0[s] | loss 0.93
    | epoch 432 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 433 |  iter 1 / 2 | time 0[s] | loss 0.93
    | epoch 434 |  iter 1 / 2 | time 0[s] | loss 0.77
    | epoch 435 |  iter 1 / 2 | time 0[s] | loss 0.69
    | epoch 436 |  iter 1 / 2 | time 0[s] | loss 0.80
    | epoch 437 |  iter 1 / 2 | time 0[s] | loss 0.86
    | epoch 438 |  iter 1 / 2 | time 0[s] | loss 0.86
    | epoch 439 |  iter 1 / 2 | time 0[s] | loss 0.64
    | epoch 440 |  iter 1 / 2 | time 0[s] | loss 1.01
    | epoch 441 |  iter 1 / 2 | time 0[s] | loss 0.57
    | epoch 442 |  iter 1 / 2 | time 0[s] | loss 0.73
    | epoch 443 |  iter 1 / 2 | time 0[s] | loss 0.88
    | epoch 444 |  iter 1 / 2 | time 0[s] | loss 0.79
    | epoch 445 |  iter 1 / 2 | time 0[s] | loss 0.76
    | epoch 446 |  iter 1 / 2 | time 0[s] | loss 0.92
    | epoch 447 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 448 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 449 |  iter 1 / 2 | time 0[s] | loss 0.83
    | epoch 450 |  iter 1 / 2 | time 0[s] | loss 0.69
    | epoch 451 |  iter 1 / 2 | time 0[s] | loss 0.91
    | epoch 452 |  iter 1 / 2 | time 0[s] | loss 0.74
    | epoch 453 |  iter 1 / 2 | time 0[s] | loss 0.82
    | epoch 454 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 455 |  iter 1 / 2 | time 0[s] | loss 0.78
    | epoch 456 |  iter 1 / 2 | time 0[s] | loss 0.80
    | epoch 457 |  iter 1 / 2 | time 0[s] | loss 0.79
    | epoch 458 |  iter 1 / 2 | time 0[s] | loss 0.73
    | epoch 459 |  iter 1 / 2 | time 0[s] | loss 0.88
    | epoch 460 |  iter 1 / 2 | time 0[s] | loss 0.64
    | epoch 461 |  iter 1 / 2 | time 0[s] | loss 0.90
    | epoch 462 |  iter 1 / 2 | time 0[s] | loss 0.64
    | epoch 463 |  iter 1 / 2 | time 0[s] | loss 0.79
    | epoch 464 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 465 |  iter 1 / 2 | time 0[s] | loss 0.71
    | epoch 466 |  iter 1 / 2 | time 0[s] | loss 0.85
    | epoch 467 |  iter 1 / 2 | time 0[s] | loss 0.61
    | epoch 468 |  iter 1 / 2 | time 0[s] | loss 0.85
    | epoch 469 |  iter 1 / 2 | time 0[s] | loss 0.76
    | epoch 470 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 471 |  iter 1 / 2 | time 0[s] | loss 0.79
    | epoch 472 |  iter 1 / 2 | time 0[s] | loss 0.82
    | epoch 473 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 474 |  iter 1 / 2 | time 0[s] | loss 0.79
    | epoch 475 |  iter 1 / 2 | time 0[s] | loss 0.76
    | epoch 476 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 477 |  iter 1 / 2 | time 0[s] | loss 0.63
    | epoch 478 |  iter 1 / 2 | time 0[s] | loss 0.87
    | epoch 479 |  iter 1 / 2 | time 0[s] | loss 0.64
    | epoch 480 |  iter 1 / 2 | time 0[s] | loss 0.96
    | epoch 481 |  iter 1 / 2 | time 0[s] | loss 0.55
    | epoch 482 |  iter 1 / 2 | time 0[s] | loss 0.79
    | epoch 483 |  iter 1 / 2 | time 0[s] | loss 0.84
    | epoch 484 |  iter 1 / 2 | time 0[s] | loss 0.66
    | epoch 485 |  iter 1 / 2 | time 0[s] | loss 0.78
    | epoch 486 |  iter 1 / 2 | time 0[s] | loss 0.79
    | epoch 487 |  iter 1 / 2 | time 0[s] | loss 0.62
    | epoch 488 |  iter 1 / 2 | time 0[s] | loss 0.83
    | epoch 489 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 490 |  iter 1 / 2 | time 0[s] | loss 0.95
    | epoch 491 |  iter 1 / 2 | time 0[s] | loss 0.45
    | epoch 492 |  iter 1 / 2 | time 0[s] | loss 0.96
    | epoch 493 |  iter 1 / 2 | time 0[s] | loss 0.61
    | epoch 494 |  iter 1 / 2 | time 0[s] | loss 0.83
    | epoch 495 |  iter 1 / 2 | time 0[s] | loss 0.70
    | epoch 496 |  iter 1 / 2 | time 0[s] | loss 0.61
    | epoch 497 |  iter 1 / 2 | time 0[s] | loss 0.96
    | epoch 498 |  iter 1 / 2 | time 0[s] | loss 0.70
    | epoch 499 |  iter 1 / 2 | time 0[s] | loss 0.65
    | epoch 500 |  iter 1 / 2 | time 0[s] | loss 0.73
    | epoch 501 |  iter 1 / 2 | time 0[s] | loss 0.87
    | epoch 502 |  iter 1 / 2 | time 0[s] | loss 0.69
    | epoch 503 |  iter 1 / 2 | time 0[s] | loss 0.74
    | epoch 504 |  iter 1 / 2 | time 0[s] | loss 0.65
    | epoch 505 |  iter 1 / 2 | time 0[s] | loss 0.82
    | epoch 506 |  iter 1 / 2 | time 0[s] | loss 0.64
    | epoch 507 |  iter 1 / 2 | time 0[s] | loss 0.87
    | epoch 508 |  iter 1 / 2 | time 0[s] | loss 0.72
    | epoch 509 |  iter 1 / 2 | time 0[s] | loss 0.82
    | epoch 510 |  iter 1 / 2 | time 0[s] | loss 0.52
    | epoch 511 |  iter 1 / 2 | time 0[s] | loss 0.85
    | epoch 512 |  iter 1 / 2 | time 0[s] | loss 0.65
    | epoch 513 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 514 |  iter 1 / 2 | time 0[s] | loss 0.68
    | epoch 515 |  iter 1 / 2 | time 0[s] | loss 0.77
    | epoch 516 |  iter 1 / 2 | time 0[s] | loss 0.68
    | epoch 517 |  iter 1 / 2 | time 0[s] | loss 0.64
    | epoch 518 |  iter 1 / 2 | time 0[s] | loss 0.76
    | epoch 519 |  iter 1 / 2 | time 0[s] | loss 0.80
    | epoch 520 |  iter 1 / 2 | time 0[s] | loss 0.60
    | epoch 521 |  iter 1 / 2 | time 0[s] | loss 0.80
    | epoch 522 |  iter 1 / 2 | time 0[s] | loss 0.68
    | epoch 523 |  iter 1 / 2 | time 0[s] | loss 0.72
    | epoch 524 |  iter 1 / 2 | time 0[s] | loss 0.88
    | epoch 525 |  iter 1 / 2 | time 0[s] | loss 0.51
    | epoch 526 |  iter 1 / 2 | time 0[s] | loss 0.84
    | epoch 527 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 528 |  iter 1 / 2 | time 0[s] | loss 0.76
    | epoch 529 |  iter 1 / 2 | time 0[s] | loss 0.71
    | epoch 530 |  iter 1 / 2 | time 0[s] | loss 0.71
    | epoch 531 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 532 |  iter 1 / 2 | time 0[s] | loss 0.63
    | epoch 533 |  iter 1 / 2 | time 0[s] | loss 0.71
    | epoch 534 |  iter 1 / 2 | time 0[s] | loss 0.71
    | epoch 535 |  iter 1 / 2 | time 0[s] | loss 0.83
    | epoch 536 |  iter 1 / 2 | time 0[s] | loss 0.58
    | epoch 537 |  iter 1 / 2 | time 0[s] | loss 0.83
    | epoch 538 |  iter 1 / 2 | time 0[s] | loss 0.66
    | epoch 539 |  iter 1 / 2 | time 0[s] | loss 0.75
    | epoch 540 |  iter 1 / 2 | time 0[s] | loss 0.70
    | epoch 541 |  iter 1 / 2 | time 0[s] | loss 0.62
    | epoch 542 |  iter 1 / 2 | time 0[s] | loss 0.70
    | epoch 543 |  iter 1 / 2 | time 0[s] | loss 0.66
    | epoch 544 |  iter 1 / 2 | time 0[s] | loss 0.70
    | epoch 545 |  iter 1 / 2 | time 0[s] | loss 0.74
    | epoch 546 |  iter 1 / 2 | time 0[s] | loss 0.87
    | epoch 547 |  iter 1 / 2 | time 0[s] | loss 0.49
    | epoch 548 |  iter 1 / 2 | time 0[s] | loss 0.91
    | epoch 549 |  iter 1 / 2 | time 0[s] | loss 0.49
    | epoch 550 |  iter 1 / 2 | time 0[s] | loss 0.61
    | epoch 551 |  iter 1 / 2 | time 0[s] | loss 0.78
    | epoch 552 |  iter 1 / 2 | time 0[s] | loss 0.82
    | epoch 553 |  iter 1 / 2 | time 0[s] | loss 0.61
    | epoch 554 |  iter 1 / 2 | time 0[s] | loss 0.69
    | epoch 555 |  iter 1 / 2 | time 0[s] | loss 0.78
    | epoch 556 |  iter 1 / 2 | time 0[s] | loss 0.48
    | epoch 557 |  iter 1 / 2 | time 0[s] | loss 0.99
    | epoch 558 |  iter 1 / 2 | time 0[s] | loss 0.61
    | epoch 559 |  iter 1 / 2 | time 0[s] | loss 0.65
    | epoch 560 |  iter 1 / 2 | time 0[s] | loss 0.69
    | epoch 561 |  iter 1 / 2 | time 0[s] | loss 0.60
    | epoch 562 |  iter 1 / 2 | time 0[s] | loss 0.68
    | epoch 563 |  iter 1 / 2 | time 0[s] | loss 0.73
    | epoch 564 |  iter 1 / 2 | time 0[s] | loss 0.78
    | epoch 565 |  iter 1 / 2 | time 0[s] | loss 0.69
    | epoch 566 |  iter 1 / 2 | time 0[s] | loss 0.60
    | epoch 567 |  iter 1 / 2 | time 0[s] | loss 0.68
    | epoch 568 |  iter 1 / 2 | time 0[s] | loss 0.77
    | epoch 569 |  iter 1 / 2 | time 0[s] | loss 0.64
    | epoch 570 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 571 |  iter 1 / 2 | time 0[s] | loss 0.55
    | epoch 572 |  iter 1 / 2 | time 0[s] | loss 0.64
    | epoch 573 |  iter 1 / 2 | time 0[s] | loss 0.64
    | epoch 574 |  iter 1 / 2 | time 0[s] | loss 0.80
    | epoch 575 |  iter 1 / 2 | time 0[s] | loss 0.59
    | epoch 576 |  iter 1 / 2 | time 0[s] | loss 0.68
    | epoch 577 |  iter 1 / 2 | time 0[s] | loss 0.63
    | epoch 578 |  iter 1 / 2 | time 0[s] | loss 0.68
    | epoch 579 |  iter 1 / 2 | time 0[s] | loss 0.68
    | epoch 580 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 581 |  iter 1 / 2 | time 0[s] | loss 0.55
    | epoch 582 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 583 |  iter 1 / 2 | time 0[s] | loss 0.59
    | epoch 584 |  iter 1 / 2 | time 0[s] | loss 0.80
    | epoch 585 |  iter 1 / 2 | time 0[s] | loss 0.55
    | epoch 586 |  iter 1 / 2 | time 0[s] | loss 0.89
    | epoch 587 |  iter 1 / 2 | time 0[s] | loss 0.75
    | epoch 588 |  iter 1 / 2 | time 0[s] | loss 0.46
    | epoch 589 |  iter 1 / 2 | time 0[s] | loss 0.79
    | epoch 590 |  iter 1 / 2 | time 0[s] | loss 0.64
    | epoch 591 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 592 |  iter 1 / 2 | time 0[s] | loss 0.57
    | epoch 593 |  iter 1 / 2 | time 0[s] | loss 0.76
    | epoch 594 |  iter 1 / 2 | time 0[s] | loss 0.70
    | epoch 595 |  iter 1 / 2 | time 0[s] | loss 0.75
    | epoch 596 |  iter 1 / 2 | time 0[s] | loss 0.66
    | epoch 597 |  iter 1 / 2 | time 0[s] | loss 0.49
    | epoch 598 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 599 |  iter 1 / 2 | time 0[s] | loss 0.70
    | epoch 600 |  iter 1 / 2 | time 0[s] | loss 0.59
    | epoch 601 |  iter 1 / 2 | time 0[s] | loss 0.69
    | epoch 602 |  iter 1 / 2 | time 0[s] | loss 0.75
    | epoch 603 |  iter 1 / 2 | time 0[s] | loss 0.54
    | epoch 604 |  iter 1 / 2 | time 0[s] | loss 0.69
    | epoch 605 |  iter 1 / 2 | time 0[s] | loss 0.63
    | epoch 606 |  iter 1 / 2 | time 0[s] | loss 0.56
    | epoch 607 |  iter 1 / 2 | time 0[s] | loss 0.75
    | epoch 608 |  iter 1 / 2 | time 0[s] | loss 0.66
    | epoch 609 |  iter 1 / 2 | time 0[s] | loss 0.73
    | epoch 610 |  iter 1 / 2 | time 0[s] | loss 0.49
    | epoch 611 |  iter 1 / 2 | time 0[s] | loss 0.86
    | epoch 612 |  iter 1 / 2 | time 0[s] | loss 0.53
    | epoch 613 |  iter 1 / 2 | time 0[s] | loss 0.78
    | epoch 614 |  iter 1 / 2 | time 0[s] | loss 0.65
    | epoch 615 |  iter 1 / 2 | time 0[s] | loss 0.65
    | epoch 616 |  iter 1 / 2 | time 0[s] | loss 0.65
    | epoch 617 |  iter 1 / 2 | time 0[s] | loss 0.56
    | epoch 618 |  iter 1 / 2 | time 0[s] | loss 0.74
    | epoch 619 |  iter 1 / 2 | time 0[s] | loss 0.52
    | epoch 620 |  iter 1 / 2 | time 0[s] | loss 0.56
    | epoch 621 |  iter 1 / 2 | time 0[s] | loss 0.74
    | epoch 622 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 623 |  iter 1 / 2 | time 0[s] | loss 0.65
    | epoch 624 |  iter 1 / 2 | time 0[s] | loss 0.74
    | epoch 625 |  iter 1 / 2 | time 0[s] | loss 0.60
    | epoch 626 |  iter 1 / 2 | time 0[s] | loss 0.68
    | epoch 627 |  iter 1 / 2 | time 0[s] | loss 0.57
    | epoch 628 |  iter 1 / 2 | time 0[s] | loss 0.61
    | epoch 629 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 630 |  iter 1 / 2 | time 0[s] | loss 0.72
    | epoch 631 |  iter 1 / 2 | time 0[s] | loss 0.57
    | epoch 632 |  iter 1 / 2 | time 0[s] | loss 0.59
    | epoch 633 |  iter 1 / 2 | time 0[s] | loss 0.74
    | epoch 634 |  iter 1 / 2 | time 0[s] | loss 0.76
    | epoch 635 |  iter 1 / 2 | time 0[s] | loss 0.34
    | epoch 636 |  iter 1 / 2 | time 0[s] | loss 0.72
    | epoch 637 |  iter 1 / 2 | time 0[s] | loss 0.64
    | epoch 638 |  iter 1 / 2 | time 0[s] | loss 0.65
    | epoch 639 |  iter 1 / 2 | time 0[s] | loss 0.72
    | epoch 640 |  iter 1 / 2 | time 0[s] | loss 0.57
    | epoch 641 |  iter 1 / 2 | time 0[s] | loss 0.73
    | epoch 642 |  iter 1 / 2 | time 0[s] | loss 0.63
    | epoch 643 |  iter 1 / 2 | time 0[s] | loss 0.52
    | epoch 644 |  iter 1 / 2 | time 0[s] | loss 0.75
    | epoch 645 |  iter 1 / 2 | time 0[s] | loss 0.63
    | epoch 646 |  iter 1 / 2 | time 0[s] | loss 0.63
    | epoch 647 |  iter 1 / 2 | time 0[s] | loss 0.62
    | epoch 648 |  iter 1 / 2 | time 0[s] | loss 0.60
    | epoch 649 |  iter 1 / 2 | time 0[s] | loss 0.63
    | epoch 650 |  iter 1 / 2 | time 0[s] | loss 0.55
    | epoch 651 |  iter 1 / 2 | time 0[s] | loss 0.63
    | epoch 652 |  iter 1 / 2 | time 0[s] | loss 0.76
    | epoch 653 |  iter 1 / 2 | time 0[s] | loss 0.41
    | epoch 654 |  iter 1 / 2 | time 0[s] | loss 0.81
    | epoch 655 |  iter 1 / 2 | time 0[s] | loss 0.66
    | epoch 656 |  iter 1 / 2 | time 0[s] | loss 0.62
    | epoch 657 |  iter 1 / 2 | time 0[s] | loss 0.49
    | epoch 658 |  iter 1 / 2 | time 0[s] | loss 0.66
    | epoch 659 |  iter 1 / 2 | time 0[s] | loss 0.67
    | epoch 660 |  iter 1 / 2 | time 1[s] | loss 0.54
    | epoch 661 |  iter 1 / 2 | time 1[s] | loss 0.65
    | epoch 662 |  iter 1 / 2 | time 1[s] | loss 0.62
    | epoch 663 |  iter 1 / 2 | time 1[s] | loss 0.69
    | epoch 664 |  iter 1 / 2 | time 1[s] | loss 0.65
    | epoch 665 |  iter 1 / 2 | time 1[s] | loss 0.62
    | epoch 666 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 667 |  iter 1 / 2 | time 1[s] | loss 0.65
    | epoch 668 |  iter 1 / 2 | time 1[s] | loss 0.57
    | epoch 669 |  iter 1 / 2 | time 1[s] | loss 0.73
    | epoch 670 |  iter 1 / 2 | time 1[s] | loss 0.60
    | epoch 671 |  iter 1 / 2 | time 1[s] | loss 0.61
    | epoch 672 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 673 |  iter 1 / 2 | time 1[s] | loss 0.61
    | epoch 674 |  iter 1 / 2 | time 1[s] | loss 0.71
    | epoch 675 |  iter 1 / 2 | time 1[s] | loss 0.40
    | epoch 676 |  iter 1 / 2 | time 1[s] | loss 0.69
    | epoch 677 |  iter 1 / 2 | time 1[s] | loss 0.53
    | epoch 678 |  iter 1 / 2 | time 1[s] | loss 0.82
    | epoch 679 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 680 |  iter 1 / 2 | time 1[s] | loss 0.51
    | epoch 681 |  iter 1 / 2 | time 1[s] | loss 0.72
    | epoch 682 |  iter 1 / 2 | time 1[s] | loss 0.52
    | epoch 683 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 684 |  iter 1 / 2 | time 1[s] | loss 0.68
    | epoch 685 |  iter 1 / 2 | time 1[s] | loss 0.51
    | epoch 686 |  iter 1 / 2 | time 1[s] | loss 0.62
    | epoch 687 |  iter 1 / 2 | time 1[s] | loss 0.70
    | epoch 688 |  iter 1 / 2 | time 1[s] | loss 0.62
    | epoch 689 |  iter 1 / 2 | time 1[s] | loss 0.49
    | epoch 690 |  iter 1 / 2 | time 1[s] | loss 0.63
    | epoch 691 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 692 |  iter 1 / 2 | time 1[s] | loss 0.71
    | epoch 693 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 694 |  iter 1 / 2 | time 1[s] | loss 0.60
    | epoch 695 |  iter 1 / 2 | time 1[s] | loss 0.42
    | epoch 696 |  iter 1 / 2 | time 1[s] | loss 0.73
    | epoch 697 |  iter 1 / 2 | time 1[s] | loss 0.68
    | epoch 698 |  iter 1 / 2 | time 1[s] | loss 0.39
    | epoch 699 |  iter 1 / 2 | time 1[s] | loss 0.78
    | epoch 700 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 701 |  iter 1 / 2 | time 1[s] | loss 0.60
    | epoch 702 |  iter 1 / 2 | time 1[s] | loss 0.68
    | epoch 703 |  iter 1 / 2 | time 1[s] | loss 0.64
    | epoch 704 |  iter 1 / 2 | time 1[s] | loss 0.58
    | epoch 705 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 706 |  iter 1 / 2 | time 1[s] | loss 0.61
    | epoch 707 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 708 |  iter 1 / 2 | time 1[s] | loss 0.61
    | epoch 709 |  iter 1 / 2 | time 1[s] | loss 0.66
    | epoch 710 |  iter 1 / 2 | time 1[s] | loss 0.51
    | epoch 711 |  iter 1 / 2 | time 1[s] | loss 0.72
    | epoch 712 |  iter 1 / 2 | time 1[s] | loss 0.57
    | epoch 713 |  iter 1 / 2 | time 1[s] | loss 0.58
    | epoch 714 |  iter 1 / 2 | time 1[s] | loss 0.61
    | epoch 715 |  iter 1 / 2 | time 1[s] | loss 0.48
    | epoch 716 |  iter 1 / 2 | time 1[s] | loss 0.57
    | epoch 717 |  iter 1 / 2 | time 1[s] | loss 0.63
    | epoch 718 |  iter 1 / 2 | time 1[s] | loss 0.67
    | epoch 719 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 720 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 721 |  iter 1 / 2 | time 1[s] | loss 0.67
    | epoch 722 |  iter 1 / 2 | time 1[s] | loss 0.58
    | epoch 723 |  iter 1 / 2 | time 1[s] | loss 0.69
    | epoch 724 |  iter 1 / 2 | time 1[s] | loss 0.60
    | epoch 725 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 726 |  iter 1 / 2 | time 1[s] | loss 0.60
    | epoch 727 |  iter 1 / 2 | time 1[s] | loss 0.58
    | epoch 728 |  iter 1 / 2 | time 1[s] | loss 0.58
    | epoch 729 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 730 |  iter 1 / 2 | time 1[s] | loss 0.52
    | epoch 731 |  iter 1 / 2 | time 1[s] | loss 0.58
    | epoch 732 |  iter 1 / 2 | time 1[s] | loss 0.62
    | epoch 733 |  iter 1 / 2 | time 1[s] | loss 0.60
    | epoch 734 |  iter 1 / 2 | time 1[s] | loss 0.58
    | epoch 735 |  iter 1 / 2 | time 1[s] | loss 0.60
    | epoch 736 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 737 |  iter 1 / 2 | time 1[s] | loss 0.68
    | epoch 738 |  iter 1 / 2 | time 1[s] | loss 0.47
    | epoch 739 |  iter 1 / 2 | time 1[s] | loss 0.57
    | epoch 740 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 741 |  iter 1 / 2 | time 1[s] | loss 0.51
    | epoch 742 |  iter 1 / 2 | time 1[s] | loss 0.65
    | epoch 743 |  iter 1 / 2 | time 1[s] | loss 0.47
    | epoch 744 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 745 |  iter 1 / 2 | time 1[s] | loss 0.67
    | epoch 746 |  iter 1 / 2 | time 1[s] | loss 0.36
    | epoch 747 |  iter 1 / 2 | time 1[s] | loss 0.77
    | epoch 748 |  iter 1 / 2 | time 1[s] | loss 0.57
    | epoch 749 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 750 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 751 |  iter 1 / 2 | time 1[s] | loss 0.67
    | epoch 752 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 753 |  iter 1 / 2 | time 1[s] | loss 0.44
    | epoch 754 |  iter 1 / 2 | time 1[s] | loss 0.48
    | epoch 755 |  iter 1 / 2 | time 1[s] | loss 0.67
    | epoch 756 |  iter 1 / 2 | time 1[s] | loss 0.63
    | epoch 757 |  iter 1 / 2 | time 1[s] | loss 0.39
    | epoch 758 |  iter 1 / 2 | time 1[s] | loss 0.68
    | epoch 759 |  iter 1 / 2 | time 1[s] | loss 0.62
    | epoch 760 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 761 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 762 |  iter 1 / 2 | time 1[s] | loss 0.64
    | epoch 763 |  iter 1 / 2 | time 1[s] | loss 0.47
    | epoch 764 |  iter 1 / 2 | time 1[s] | loss 0.57
    | epoch 765 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 766 |  iter 1 / 2 | time 1[s] | loss 0.63
    | epoch 767 |  iter 1 / 2 | time 1[s] | loss 0.37
    | epoch 768 |  iter 1 / 2 | time 1[s] | loss 0.74
    | epoch 769 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 770 |  iter 1 / 2 | time 1[s] | loss 0.54
    | epoch 771 |  iter 1 / 2 | time 1[s] | loss 0.57
    | epoch 772 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 773 |  iter 1 / 2 | time 1[s] | loss 0.63
    | epoch 774 |  iter 1 / 2 | time 1[s] | loss 0.45
    | epoch 775 |  iter 1 / 2 | time 1[s] | loss 0.67
    | epoch 776 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 777 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 778 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 779 |  iter 1 / 2 | time 1[s] | loss 0.67
    | epoch 780 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 781 |  iter 1 / 2 | time 1[s] | loss 0.64
    | epoch 782 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 783 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 784 |  iter 1 / 2 | time 1[s] | loss 0.54
    | epoch 785 |  iter 1 / 2 | time 1[s] | loss 0.58
    | epoch 786 |  iter 1 / 2 | time 1[s] | loss 0.52
    | epoch 787 |  iter 1 / 2 | time 1[s] | loss 0.45
    | epoch 788 |  iter 1 / 2 | time 1[s] | loss 0.54
    | epoch 789 |  iter 1 / 2 | time 1[s] | loss 0.64
    | epoch 790 |  iter 1 / 2 | time 1[s] | loss 0.48
    | epoch 791 |  iter 1 / 2 | time 1[s] | loss 0.51
    | epoch 792 |  iter 1 / 2 | time 1[s] | loss 0.63
    | epoch 793 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 794 |  iter 1 / 2 | time 1[s] | loss 0.62
    | epoch 795 |  iter 1 / 2 | time 1[s] | loss 0.45
    | epoch 796 |  iter 1 / 2 | time 1[s] | loss 0.65
    | epoch 797 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 798 |  iter 1 / 2 | time 1[s] | loss 0.57
    | epoch 799 |  iter 1 / 2 | time 1[s] | loss 0.62
    | epoch 800 |  iter 1 / 2 | time 1[s] | loss 0.34
    | epoch 801 |  iter 1 / 2 | time 1[s] | loss 0.64
    | epoch 802 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 803 |  iter 1 / 2 | time 1[s] | loss 0.71
    | epoch 804 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 805 |  iter 1 / 2 | time 1[s] | loss 0.45
    | epoch 806 |  iter 1 / 2 | time 1[s] | loss 0.62
    | epoch 807 |  iter 1 / 2 | time 1[s] | loss 0.42
    | epoch 808 |  iter 1 / 2 | time 1[s] | loss 0.64
    | epoch 809 |  iter 1 / 2 | time 1[s] | loss 0.42
    | epoch 810 |  iter 1 / 2 | time 1[s] | loss 0.45
    | epoch 811 |  iter 1 / 2 | time 1[s] | loss 0.64
    | epoch 812 |  iter 1 / 2 | time 1[s] | loss 0.53
    | epoch 813 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 814 |  iter 1 / 2 | time 1[s] | loss 0.47
    | epoch 815 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 816 |  iter 1 / 2 | time 1[s] | loss 0.44
    | epoch 817 |  iter 1 / 2 | time 1[s] | loss 0.63
    | epoch 818 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 819 |  iter 1 / 2 | time 1[s] | loss 0.53
    | epoch 820 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 821 |  iter 1 / 2 | time 1[s] | loss 0.41
    | epoch 822 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 823 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 824 |  iter 1 / 2 | time 1[s] | loss 0.44
    | epoch 825 |  iter 1 / 2 | time 1[s] | loss 0.52
    | epoch 826 |  iter 1 / 2 | time 1[s] | loss 0.63
    | epoch 827 |  iter 1 / 2 | time 1[s] | loss 0.52
    | epoch 828 |  iter 1 / 2 | time 1[s] | loss 0.41
    | epoch 829 |  iter 1 / 2 | time 1[s] | loss 0.60
    | epoch 830 |  iter 1 / 2 | time 1[s] | loss 0.35
    | epoch 831 |  iter 1 / 2 | time 1[s] | loss 0.60
    | epoch 832 |  iter 1 / 2 | time 1[s] | loss 0.52
    | epoch 833 |  iter 1 / 2 | time 1[s] | loss 0.62
    | epoch 834 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 835 |  iter 1 / 2 | time 1[s] | loss 0.52
    | epoch 836 |  iter 1 / 2 | time 1[s] | loss 0.68
    | epoch 837 |  iter 1 / 2 | time 1[s] | loss 0.51
    | epoch 838 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 839 |  iter 1 / 2 | time 1[s] | loss 0.41
    | epoch 840 |  iter 1 / 2 | time 1[s] | loss 0.51
    | epoch 841 |  iter 1 / 2 | time 1[s] | loss 0.62
    | epoch 842 |  iter 1 / 2 | time 1[s] | loss 0.51
    | epoch 843 |  iter 1 / 2 | time 1[s] | loss 0.60
    | epoch 844 |  iter 1 / 2 | time 1[s] | loss 0.32
    | epoch 845 |  iter 1 / 2 | time 1[s] | loss 0.53
    | epoch 846 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 847 |  iter 1 / 2 | time 1[s] | loss 0.48
    | epoch 848 |  iter 1 / 2 | time 1[s] | loss 0.51
    | epoch 849 |  iter 1 / 2 | time 1[s] | loss 0.51
    | epoch 850 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 851 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 852 |  iter 1 / 2 | time 1[s] | loss 0.53
    | epoch 853 |  iter 1 / 2 | time 1[s] | loss 0.48
    | epoch 854 |  iter 1 / 2 | time 1[s] | loss 0.61
    | epoch 855 |  iter 1 / 2 | time 1[s] | loss 0.42
    | epoch 856 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 857 |  iter 1 / 2 | time 1[s] | loss 0.67
    | epoch 858 |  iter 1 / 2 | time 1[s] | loss 0.23
    | epoch 859 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 860 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 861 |  iter 1 / 2 | time 1[s] | loss 0.61
    | epoch 862 |  iter 1 / 2 | time 1[s] | loss 0.41
    | epoch 863 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 864 |  iter 1 / 2 | time 1[s] | loss 0.41
    | epoch 865 |  iter 1 / 2 | time 1[s] | loss 0.60
    | epoch 866 |  iter 1 / 2 | time 1[s] | loss 0.39
    | epoch 867 |  iter 1 / 2 | time 1[s] | loss 0.69
    | epoch 868 |  iter 1 / 2 | time 1[s] | loss 0.31
    | epoch 869 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 870 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 871 |  iter 1 / 2 | time 1[s] | loss 0.31
    | epoch 872 |  iter 1 / 2 | time 1[s] | loss 0.66
    | epoch 873 |  iter 1 / 2 | time 1[s] | loss 0.41
    | epoch 874 |  iter 1 / 2 | time 1[s] | loss 0.53
    | epoch 875 |  iter 1 / 2 | time 1[s] | loss 0.57
    | epoch 876 |  iter 1 / 2 | time 1[s] | loss 0.41
    | epoch 877 |  iter 1 / 2 | time 1[s] | loss 0.58
    | epoch 878 |  iter 1 / 2 | time 1[s] | loss 0.39
    | epoch 879 |  iter 1 / 2 | time 1[s] | loss 0.40
    | epoch 880 |  iter 1 / 2 | time 1[s] | loss 0.67
    | epoch 881 |  iter 1 / 2 | time 1[s] | loss 0.39
    | epoch 882 |  iter 1 / 2 | time 1[s] | loss 0.51
    | epoch 883 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 884 |  iter 1 / 2 | time 1[s] | loss 0.48
    | epoch 885 |  iter 1 / 2 | time 1[s] | loss 0.58
    | epoch 886 |  iter 1 / 2 | time 1[s] | loss 0.40
    | epoch 887 |  iter 1 / 2 | time 1[s] | loss 0.48
    | epoch 888 |  iter 1 / 2 | time 1[s] | loss 0.39
    | epoch 889 |  iter 1 / 2 | time 1[s] | loss 0.58
    | epoch 890 |  iter 1 / 2 | time 1[s] | loss 0.37
    | epoch 891 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 892 |  iter 1 / 2 | time 1[s] | loss 0.40
    | epoch 893 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 894 |  iter 1 / 2 | time 1[s] | loss 0.42
    | epoch 895 |  iter 1 / 2 | time 1[s] | loss 0.57
    | epoch 896 |  iter 1 / 2 | time 1[s] | loss 0.37
    | epoch 897 |  iter 1 / 2 | time 1[s] | loss 0.61
    | epoch 898 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 899 |  iter 1 / 2 | time 1[s] | loss 0.48
    | epoch 900 |  iter 1 / 2 | time 1[s] | loss 0.57
    | epoch 901 |  iter 1 / 2 | time 1[s] | loss 0.39
    | epoch 902 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 903 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 904 |  iter 1 / 2 | time 1[s] | loss 0.37
    | epoch 905 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 906 |  iter 1 / 2 | time 1[s] | loss 0.58
    | epoch 907 |  iter 1 / 2 | time 1[s] | loss 0.47
    | epoch 908 |  iter 1 / 2 | time 1[s] | loss 0.32
    | epoch 909 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 910 |  iter 1 / 2 | time 1[s] | loss 0.38
    | epoch 911 |  iter 1 / 2 | time 1[s] | loss 0.54
    | epoch 912 |  iter 1 / 2 | time 1[s] | loss 0.38
    | epoch 913 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 914 |  iter 1 / 2 | time 1[s] | loss 0.47
    | epoch 915 |  iter 1 / 2 | time 1[s] | loss 0.47
    | epoch 916 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 917 |  iter 1 / 2 | time 1[s] | loss 0.29
    | epoch 918 |  iter 1 / 2 | time 1[s] | loss 0.47
    | epoch 919 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 920 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 921 |  iter 1 / 2 | time 1[s] | loss 0.38
    | epoch 922 |  iter 1 / 2 | time 1[s] | loss 0.38
    | epoch 923 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 924 |  iter 1 / 2 | time 1[s] | loss 0.40
    | epoch 925 |  iter 1 / 2 | time 1[s] | loss 0.44
    | epoch 926 |  iter 1 / 2 | time 1[s] | loss 0.64
    | epoch 927 |  iter 1 / 2 | time 1[s] | loss 0.26
    | epoch 928 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 929 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 930 |  iter 1 / 2 | time 1[s] | loss 0.57
    | epoch 931 |  iter 1 / 2 | time 1[s] | loss 0.38
    | epoch 932 |  iter 1 / 2 | time 1[s] | loss 0.37
    | epoch 933 |  iter 1 / 2 | time 1[s] | loss 0.55
    | epoch 934 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 935 |  iter 1 / 2 | time 1[s] | loss 0.48
    | epoch 936 |  iter 1 / 2 | time 1[s] | loss 0.53
    | epoch 937 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 938 |  iter 1 / 2 | time 1[s] | loss 0.37
    | epoch 939 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 940 |  iter 1 / 2 | time 1[s] | loss 0.54
    | epoch 941 |  iter 1 / 2 | time 1[s] | loss 0.48
    | epoch 942 |  iter 1 / 2 | time 1[s] | loss 0.48
    | epoch 943 |  iter 1 / 2 | time 1[s] | loss 0.42
    | epoch 944 |  iter 1 / 2 | time 1[s] | loss 0.37
    | epoch 945 |  iter 1 / 2 | time 1[s] | loss 0.45
    | epoch 946 |  iter 1 / 2 | time 1[s] | loss 0.45
    | epoch 947 |  iter 1 / 2 | time 1[s] | loss 0.53
    | epoch 948 |  iter 1 / 2 | time 1[s] | loss 0.34
    | epoch 949 |  iter 1 / 2 | time 1[s] | loss 0.56
    | epoch 950 |  iter 1 / 2 | time 1[s] | loss 0.45
    | epoch 951 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 952 |  iter 1 / 2 | time 1[s] | loss 0.35
    | epoch 953 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 954 |  iter 1 / 2 | time 1[s] | loss 0.44
    | epoch 955 |  iter 1 / 2 | time 1[s] | loss 0.45
    | epoch 956 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 957 |  iter 1 / 2 | time 1[s] | loss 0.41
    | epoch 958 |  iter 1 / 2 | time 1[s] | loss 0.59
    | epoch 959 |  iter 1 / 2 | time 1[s] | loss 0.33
    | epoch 960 |  iter 1 / 2 | time 1[s] | loss 0.54
    | epoch 961 |  iter 1 / 2 | time 1[s] | loss 0.41
    | epoch 962 |  iter 1 / 2 | time 1[s] | loss 0.35
    | epoch 963 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 964 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 965 |  iter 1 / 2 | time 1[s] | loss 0.48
    | epoch 966 |  iter 1 / 2 | time 1[s] | loss 0.35
    | epoch 967 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 968 |  iter 1 / 2 | time 1[s] | loss 0.44
    | epoch 969 |  iter 1 / 2 | time 1[s] | loss 0.38
    | epoch 970 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 971 |  iter 1 / 2 | time 1[s] | loss 0.27
    | epoch 972 |  iter 1 / 2 | time 1[s] | loss 0.44
    | epoch 973 |  iter 1 / 2 | time 1[s] | loss 0.53
    | epoch 974 |  iter 1 / 2 | time 1[s] | loss 0.35
    | epoch 975 |  iter 1 / 2 | time 1[s] | loss 0.53
    | epoch 976 |  iter 1 / 2 | time 1[s] | loss 0.44
    | epoch 977 |  iter 1 / 2 | time 1[s] | loss 0.46
    | epoch 978 |  iter 1 / 2 | time 1[s] | loss 0.42
    | epoch 979 |  iter 1 / 2 | time 1[s] | loss 0.40
    | epoch 980 |  iter 1 / 2 | time 1[s] | loss 0.49
    | epoch 981 |  iter 1 / 2 | time 1[s] | loss 0.53
    | epoch 982 |  iter 1 / 2 | time 1[s] | loss 0.32
    | epoch 983 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 984 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 985 |  iter 1 / 2 | time 1[s] | loss 0.50
    | epoch 986 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 987 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 988 |  iter 1 / 2 | time 1[s] | loss 0.41
    | epoch 989 |  iter 1 / 2 | time 1[s] | loss 0.45
    | epoch 990 |  iter 1 / 2 | time 1[s] | loss 0.36
    | epoch 991 |  iter 1 / 2 | time 1[s] | loss 0.39
    | epoch 992 |  iter 1 / 2 | time 1[s] | loss 0.47
    | epoch 993 |  iter 1 / 2 | time 1[s] | loss 0.45
    | epoch 994 |  iter 1 / 2 | time 1[s] | loss 0.41
    | epoch 995 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 996 |  iter 1 / 2 | time 1[s] | loss 0.43
    | epoch 997 |  iter 1 / 2 | time 1[s] | loss 0.48
    | epoch 998 |  iter 1 / 2 | time 1[s] | loss 0.35
    | epoch 999 |  iter 1 / 2 | time 1[s] | loss 0.52
    | epoch 1000 |  iter 1 / 2 | time 1[s] | loss 0.26
    

    損失値をプロットしてみる

    %python
    trainer.plot()
    z.show(plt, fmt='svg')

    ベクトルを表示してみる

    %python
    word_vecs = model.word_vecs
    for word_id, word in id_to_word.items():
        print(word, word_vecs[word_id])
    you [ 0.93710303  0.93910193  1.7272372  -0.89610606  1.0445951 ]
    say [-1.1644877  -1.2109934  -0.20577171  1.23597    -1.2464908 ]
    goodbye [ 1.1030452  1.0522411 -0.1555654 -1.0932515  0.8510445]
    and [-0.77724737 -1.0205745  -1.8217171   0.9609459  -1.050846  ]
    i [ 1.1081636   1.0668204  -0.13783155 -1.1415119   0.8612589 ]
    hello [ 0.9385245  0.9236376  1.7012237 -0.9081088  1.0232164]
    . [-1.1489888 -1.061469   1.6251746  1.10383   -1.1069291]
    

    • 単語を密なベクトルで表現できたが、小さなコーパスでは良い結果は得られない
    • 今回の実装では大きなコーパスを処理することはできない

    CBOW モデル: 補足

    確率の表記

    • \( P(A) \): A という事象が起こる確率
    • \( P(A, B) \): A と B が同時に起こる確率
    • \( P(A|B) \): B が起きたあとに A が起こる確率( 事後確率

    CBOW モデルと確率

    $$
    P(w_t | w_{t-1},w_{t+1})
    $$

    \(上の式は w_{t-1} と w_{t+1} が起こった後に w_t が起こる確率を表す\)

    交差エントロピー誤差

    $$
    L = -\ log\ P(w_t|w_{t-1},w_{t+1})
    $$

    コーパス全体に拡張すると

    $$
    L = - \frac{1}{T} \sum^{T}_{t=1} log\ P(w_t|w_{t-1},w_{t+1})
    $$

    学習するときはこの損失関数を出来る限り小さくする