ゼロから作る Deep Learning 2/word2vec

Posted on

3章 - word2vec

ゼロから作る Deep Learning (2) 自然言語処理編の読書メモです。今回は3章の “word2vec” まで読みました。単語の分散表現について CBOW モデルを利用した推論ベースの手法をみていきます。

カウントベースの手法との違い

  • カウントベースの手法では学習データを一度にまとめて処理する
  • 推論ベースの手法では学習データの一部を使って逐次的に学習を行う
    • データを小分けにして学習できる

推論ベースの手法

you ??? goodbye and I say hello

上の ??? にあたる部分にどのような単語が出現するかを推測する(モデルは各単語の出現確率を出力)

単語の処理

  • ニューラルネットワークで単語を処理するには固定長のベクトルに変換する必要がある
  • one-hot 表現: ベクトルの要素の中のひとつだけが 1 で残りはすべて 0 であるようなベクトル

全結合層

%sh
pip3 install numpy matplotlib
Requirement already satisfied (use --upgrade to upgrade): numpy in /usr/local/lib/python3.5/dist-packages
Collecting matplotlib
  Downloading https://files.pythonhosted.org/packages/7b/ca/8b55a66b7ce426329ab16419a7eee4eb35b5a3fbe0d002434b339a4a7b09/matplotlib-3.0.0-cp35-cp35m-manylinux1_x86_64.whl (12.8MB)
Collecting cycler>=0.10 (from matplotlib)
  Using cached https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
Collecting python-dateutil>=2.1 (from matplotlib)
  Using cached https://files.pythonhosted.org/packages/cf/f5/af2b09c957ace60dcfac112b669c45c8c97e32f94aa8b56da4c6d1682825/python_dateutil-2.7.3-py2.py3-none-any.whl
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 (from matplotlib)
  Downloading https://files.pythonhosted.org/packages/42/47/e6d51aef3d0393f7d343592d63a73beee2a8d3d69c22b053e252c6cfacd5/pyparsing-2.2.1-py2.py3-none-any.whl (57kB)
Collecting kiwisolver>=1.0.1 (from matplotlib)
  Downloading https://files.pythonhosted.org/packages/7e/31/d6fedd4fb2c94755cd101191e581af30e1650ccce7a35bddb7930fed6574/kiwisolver-1.0.1-cp35-cp35m-manylinux1_x86_64.whl (949kB)
Collecting six (from cycler>=0.10->matplotlib)
  Downloading https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/lib/python3/dist-packages (from kiwisolver>=1.0.1->matplotlib)
Installing collected packages: six, cycler, python-dateutil, pyparsing, kiwisolver, matplotlib
Successfully installed cycler-0.10.0 kiwisolver-1.0.1 matplotlib-3.0.0 pyparsing-2.2.1 python-dateutil-2.7.3 six-1.11.0
You are using pip version 8.1.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

%python
import numpy as np

c = np.array([[1, 0, 0, 0, 0, 0, 0]]) # 入力
W = np.random.randn(7, 3) # 重み
h = np.dot(c, W) # 中間ノード

h
array([[ 0.51548717,  0.69375812, -0.52163008]])

重みWの対応する行が抜き出される

CBOWモデル: 推論処理

%python
class MatMul:
    def __init__(self, W):
        self.params = [W]
        self.grads = [np.zeros_like(W)]
        self.x = None

    def forward(self, x):
        W, = self.params
        out = np.dot(x, W)
        self.x = x
        return out

    def backward(self, dout):
        W, = self.params
        dx = np.dot(dout, W.T)
        dW = np.dot(self.x.T, dout)
        self.grads[0][...] = dW
        return dx

%python
import numpy as np

# コンテキストデータ
c0 = np.array([[1, 0, 0, 0, 0, 0, 0]])
c1 = np.array([[0, 0, 1, 0, 0, 0, 0]])

# 重みの初期化
W_in = np.random.randn(7, 3)
W_out = np.random.randn(3, 7)

# レイヤの生成
in_layer0 = MatMul(W_in)
in_layer1 = MatMul(W_in)
out_layer = MatMul(W_out)

# 順伝搬
h0 = in_layer0.forward(c0)
h1 = in_layer1.forward(c1)
h = 0.5 * (h0 + h1)
s = out_layer.forward(h)

s
array([[-2.89929998, -2.361709  ,  3.07450532,  1.05383403,  3.11066845,
        -0.50834708, -2.51717838]])

CBOW モデル: 学習

  • 上記の実装では出力層で各単語のスコアを出力している
  • このスコアに対して Softmax 関数を適用すると確率を得ることが出来る
  • 得られた確率の交差エントロピー誤差を損失値として扱う(TODO: ここよく分かってない)
  • CBOW モデルはコーパスにおける単語の出現パターンを学ぶだけ
    • コーパスが違えば得られる単語の分散表現も異なってくる

%python
# 単語にIDを割り振る
def preprocess(text):
    text = text.lower()
    text = text.replace('.', ' .')
    words = text.split(' ')

    word_to_id = {}
    id_to_word = {}
    for word in words:
        if word not in word_to_id:
            new_id = len(word_to_id)
            word_to_id[word] = new_id
            id_to_word[new_id] = word

    corpus = np.array([word_to_id[w] for w in words])

    return corpus, word_to_id, id_to_word

%python
text = 'You say goodbye and I say hello.'
corpus, word_to_id, id_to_word = preprocess(text)

print('corpus: ', corpus)
print('id_to_word: ', id_to_word)
corpus:  [0 1 2 3 4 1 5 6]
id_to_word:  {0: 'you', 1: 'say', 2: 'goodbye', 3: 'and', 4: 'i', 5: 'hello', 6: '.'}

%python
def create_contexts_target(corpus, window_size=1):
    '''one-hot表現への変換を行う
    :param words: 単語IDのNumPy配列
    :param vocab_size: 語彙数
    :return: one-hot表現に変換後のNumPy配列
    '''
    target = corpus[window_size:-window_size]
    contexts = []

    for idx in range(window_size, len(corpus)-window_size):
        cs = []
        for t in range(-window_size, window_size + 1):
            if t == 0:
                continue
            cs.append(corpus[idx + t])
        contexts.append(cs)

    return np.array(contexts), np.array(target)

%python
contexts, target = create_contexts_target(corpus, window_size=1)

print('contexts: ', contexts)

print('target: ', target)
contexts:  [[0 2]
 [1 3]
 [2 4]
 [3 1]
 [4 5]
 [1 6]]
target:  [1 2 3 4 1 5]

%python
def convert_one_hot(corpus, vocab_size):
    '''one-hot表現への変換
    :param corpus: 単語IDのリスト(1次元もしくは2次元のNumPy配列)
    :param vocab_size: 語彙数
    :return: one-hot表現(2次元もしくは3次元のNumPy配列)
    '''
    N = corpus.shape[0]

    if corpus.ndim == 1:
        one_hot = np.zeros((N, vocab_size), dtype=np.int32)
        for idx, word_id in enumerate(corpus):
            one_hot[idx, word_id] = 1

    elif corpus.ndim == 2:
        C = corpus.shape[1]
        one_hot = np.zeros((N, C, vocab_size), dtype=np.int32)
        for idx_0, word_ids in enumerate(corpus):
            for idx_1, word_id in enumerate(word_ids):
                one_hot[idx_0, idx_1, word_id] = 1

    return one_hot

%python
vocab_size = len(word_to_id)
target = convert_one_hot(target, vocab_size)
contexts = convert_one_hot(contexts, vocab_size)

CBOW モデル: 実装

%python
def softmax(x):
    if x.ndim == 2:
        x = x - x.max(axis=1, keepdims=True)
        x = np.exp(x)
        x /= x.sum(axis=1, keepdims=True)
    elif x.ndim == 1:
        x = x - np.max(x)
        x = np.exp(x) / np.sum(np.exp(x))

    return x

def cross_entropy_error(y, t):
    if y.ndim == 1:
        t = t.reshape(1, t.size)
        y = y.reshape(1, y.size)
        
    # 教師データがone-hot-vectorの場合、正解ラベルのインデックスに変換
    if t.size == y.size:
        t = t.argmax(axis=1)
             
    batch_size = y.shape[0]

    return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size

class SoftmaxWithLoss:
    def __init__(self):
        self.params, self.grads = [], []
        self.y = None  # softmaxの出力
        self.t = None  # 教師ラベル

    def forward(self, x, t):
        self.t = t
        self.y = softmax(x)

        # 教師ラベルがone-hotベクトルの場合、正解のインデックスに変換
        if self.t.size == self.y.size:
            self.t = self.t.argmax(axis=1)

        loss = cross_entropy_error(self.y, self.t)
        return loss

    def backward(self, dout=1):
        batch_size = self.t.shape[0]

        dx = self.y.copy()
        dx[np.arange(batch_size), self.t] -= 1
        dx *= dout
        dx = dx / batch_size

        return dx

%python
class SimpleCBOW:
    # 初期化
    def __init__(self, vocab_size, hidden_size):
        V, H = vocab_size, hidden_size
        
        # 重みの初期化
        W_in = 0.01 * np.random.randn(V, H).astype('f')
        W_out = 0.01 * np.random.randn(H, V).astype('f')
        
        # レイヤの生成
        self.in_layer0 = MatMul(W_in)
        self.in_layer1 = MatMul(W_in)
        self.out_layer = MatMul(W_out)
        self.loss_layer = SoftmaxWithLoss()
        
        # 
        layers = [self.in_layer0, self.in_layer1, self.out_layer]
        self.params, self.grads = [], []
        for layer in layers:
            self.params += layer.params
            self.grads += layer.grads
        
        # 単語の分散表現
        self.word_vecs = W_in
    
    # 順伝搬
    def forward(self, contexts, target):
        h0 = self.in_layer0.forward(contexts[:, 0])
        h1 = self.in_layer1.forward(contexts[:, 1])
        h = (h0 + h1) * 0.5
        
        score = self.out_layer.forward(h)
        loss = self.loss_layer.forward(score, target)
        return loss
    
    # 逆伝搬
    def backward(self, dout=1):
        ds = self.loss_layer.backward(dout)
        da = self.out_layer.backward(ds)
        da *= 0.5
        self.in_layer0.backward(da)
        self.in_layer1.backward(da)
        return None

Trainer

%python
import numpy
import time
import matplotlib.pyplot as plt

def clip_grads(grads, max_norm):
    total_norm = 0
    for grad in grads:
        total_norm += np.sum(grad ** 2)
    total_norm = np.sqrt(total_norm)

    rate = max_norm / (total_norm + 1e-6)
    if rate < 1:
        for grad in grads:
            grad *= rate

def remove_duplicate(params, grads):
    '''
    パラメータ配列中の重複する重みをひとつに集約し、
    その重みに対応する勾配を加算する
    '''
    params, grads = params[:], grads[:]  # copy list

    while True:
        find_flg = False
        L = len(params)

        for i in range(0, L - 1):
            for j in range(i + 1, L):
                # 重みを共有する場合
                if params[i] is params[j]:
                    grads[i] += grads[j]  # 勾配の加算
                    find_flg = True
                    params.pop(j)
                    grads.pop(j)
                # 転置行列として重みを共有する場合(weight tying)
                elif params[i].ndim == 2 and params[j].ndim == 2 and \
                     params[i].T.shape == params[j].shape and np.all(params[i].T == params[j]):
                    grads[i] += grads[j].T
                    find_flg = True
                    params.pop(j)
                    grads.pop(j)

                if find_flg: break
            if find_flg: break

        if not find_flg: break

    return params, grads

class Trainer:
    def __init__(self, model, optimizer):
        self.model = model
        self.optimizer = optimizer
        self.loss_list = []
        self.eval_interval = None
        self.current_epoch = 0

    def fit(self, x, t, max_epoch=10, batch_size=32, max_grad=None, eval_interval=20):
        data_size = len(x)
        max_iters = data_size // batch_size
        self.eval_interval = eval_interval
        model, optimizer = self.model, self.optimizer
        total_loss = 0
        loss_count = 0

        start_time = time.time()
        for epoch in range(max_epoch):
            # シャッフル
            idx = numpy.random.permutation(numpy.arange(data_size))
            x = x[idx]
            t = t[idx]

            for iters in range(max_iters):
                batch_x = x[iters*batch_size:(iters+1)*batch_size]
                batch_t = t[iters*batch_size:(iters+1)*batch_size]

                # 勾配を求め、パラメータを更新
                loss = model.forward(batch_x, batch_t)
                model.backward()
                params, grads = remove_duplicate(model.params, model.grads)  # 共有された重みを1つに集約
                if max_grad is not None:
                    clip_grads(grads, max_grad)
                optimizer.update(params, grads)
                total_loss += loss
                loss_count += 1

                # 評価
                if (eval_interval is not None) and (iters % eval_interval) == 0:
                    avg_loss = total_loss / loss_count
                    elapsed_time = time.time() - start_time
                    print('| epoch %d |  iter %d / %d | time %d[s] | loss %.2f'
                          % (self.current_epoch + 1, iters + 1, max_iters, elapsed_time, avg_loss))
                    self.loss_list.append(float(avg_loss))
                    total_loss, loss_count = 0, 0

            self.current_epoch += 1

    def plot(self, ylim=None):
        x = numpy.arange(len(self.loss_list))
        if ylim is not None:
            plt.ylim(*ylim)
        plt.plot(x, self.loss_list, label='train')
        plt.xlabel('iterations (x' + str(self.eval_interval) + ')')
        plt.ylabel('loss')
        plt.show()

Optimizer

%python
class Adam:
    '''
    Adam (http://arxiv.org/abs/1412.6980v8)
    '''
    def __init__(self, lr=0.001, beta1=0.9, beta2=0.999):
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.iter = 0
        self.m = None
        self.v = None
        
    def update(self, params, grads):
        if self.m is None:
            self.m, self.v = [], []
            for param in params:
                self.m.append(np.zeros_like(param))
                self.v.append(np.zeros_like(param))
        
        self.iter += 1
        lr_t = self.lr * np.sqrt(1.0 - self.beta2**self.iter) / (1.0 - self.beta1**self.iter)

        for i in range(len(params)):
            self.m[i] += (1 - self.beta1) * (grads[i] - self.m[i])
            self.v[i] += (1 - self.beta2) * (grads[i]**2 - self.v[i])
            
            params[i] -= lr_t * self.m[i] / (np.sqrt(self.v[i]) + 1e-7)

トレーニング実行

%python
window_size = 1
hidden_size = 5
batch_size = 3
max_epoch = 1000

model = SimpleCBOW(vocab_size, hidden_size)
optimizer = Adam()
trainer = Trainer(model, optimizer)

trainer.fit(contexts, target, max_epoch, batch_size)
| epoch 1 |  iter 1 / 2 | time 0[s] | loss 1.95
| epoch 2 |  iter 1 / 2 | time 0[s] | loss 1.95
| epoch 3 |  iter 1 / 2 | time 0[s] | loss 1.95
| epoch 4 |  iter 1 / 2 | time 0[s] | loss 1.95
| epoch 5 |  iter 1 / 2 | time 0[s] | loss 1.95
| epoch 6 |  iter 1 / 2 | time 0[s] | loss 1.95
| epoch 7 |  iter 1 / 2 | time 0[s] | loss 1.95
| epoch 8 |  iter 1 / 2 | time 0[s] | loss 1.95
| epoch 9 |  iter 1 / 2 | time 0[s] | loss 1.95
| epoch 10 |  iter 1 / 2 | time 0[s] | loss 1.95
| epoch 11 |  iter 1 / 2 | time 0[s] | loss 1.95
| epoch 12 |  iter 1 / 2 | time 0[s] | loss 1.95
| epoch 13 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 14 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 15 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 16 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 17 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 18 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 19 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 20 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 21 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 22 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 23 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 24 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 25 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 26 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 27 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 28 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 29 |  iter 1 / 2 | time 0[s] | loss 1.94
| epoch 30 |  iter 1 / 2 | time 0[s] | loss 1.93
| epoch 31 |  iter 1 / 2 | time 0[s] | loss 1.93
| epoch 32 |  iter 1 / 2 | time 0[s] | loss 1.93
| epoch 33 |  iter 1 / 2 | time 0[s] | loss 1.93
| epoch 34 |  iter 1 / 2 | time 0[s] | loss 1.93
| epoch 35 |  iter 1 / 2 | time 0[s] | loss 1.93
| epoch 36 |  iter 1 / 2 | time 0[s] | loss 1.93
| epoch 37 |  iter 1 / 2 | time 0[s] | loss 1.93
| epoch 38 |  iter 1 / 2 | time 0[s] | loss 1.93
| epoch 39 |  iter 1 / 2 | time 0[s] | loss 1.92
| epoch 40 |  iter 1 / 2 | time 0[s] | loss 1.92
| epoch 41 |  iter 1 / 2 | time 0[s] | loss 1.92
| epoch 42 |  iter 1 / 2 | time 0[s] | loss 1.92
| epoch 43 |  iter 1 / 2 | time 0[s] | loss 1.91
| epoch 44 |  iter 1 / 2 | time 0[s] | loss 1.92
| epoch 45 |  iter 1 / 2 | time 0[s] | loss 1.91
| epoch 46 |  iter 1 / 2 | time 0[s] | loss 1.92
| epoch 47 |  iter 1 / 2 | time 0[s] | loss 1.91
| epoch 48 |  iter 1 / 2 | time 0[s] | loss 1.90
| epoch 49 |  iter 1 / 2 | time 0[s] | loss 1.91
| epoch 50 |  iter 1 / 2 | time 0[s] | loss 1.90
| epoch 51 |  iter 1 / 2 | time 0[s] | loss 1.90
| epoch 52 |  iter 1 / 2 | time 0[s] | loss 1.91
| epoch 53 |  iter 1 / 2 | time 0[s] | loss 1.89
| epoch 54 |  iter 1 / 2 | time 0[s] | loss 1.90
| epoch 55 |  iter 1 / 2 | time 0[s] | loss 1.89
| epoch 56 |  iter 1 / 2 | time 0[s] | loss 1.88
| epoch 57 |  iter 1 / 2 | time 0[s] | loss 1.89
| epoch 58 |  iter 1 / 2 | time 0[s] | loss 1.88
| epoch 59 |  iter 1 / 2 | time 0[s] | loss 1.89
| epoch 60 |  iter 1 / 2 | time 0[s] | loss 1.88
| epoch 61 |  iter 1 / 2 | time 0[s] | loss 1.87
| epoch 62 |  iter 1 / 2 | time 0[s] | loss 1.88
| epoch 63 |  iter 1 / 2 | time 0[s] | loss 1.86
| epoch 64 |  iter 1 / 2 | time 0[s] | loss 1.87
| epoch 65 |  iter 1 / 2 | time 0[s] | loss 1.85
| epoch 66 |  iter 1 / 2 | time 0[s] | loss 1.86
| epoch 67 |  iter 1 / 2 | time 0[s] | loss 1.86
| epoch 68 |  iter 1 / 2 | time 0[s] | loss 1.85
| epoch 69 |  iter 1 / 2 | time 0[s] | loss 1.85
| epoch 70 |  iter 1 / 2 | time 0[s] | loss 1.85
| epoch 71 |  iter 1 / 2 | time 0[s] | loss 1.83
| epoch 72 |  iter 1 / 2 | time 0[s] | loss 1.84
| epoch 73 |  iter 1 / 2 | time 0[s] | loss 1.83
| epoch 74 |  iter 1 / 2 | time 0[s] | loss 1.83
| epoch 75 |  iter 1 / 2 | time 0[s] | loss 1.82
| epoch 76 |  iter 1 / 2 | time 0[s] | loss 1.83
| epoch 77 |  iter 1 / 2 | time 0[s] | loss 1.80
| epoch 78 |  iter 1 / 2 | time 0[s] | loss 1.83
| epoch 79 |  iter 1 / 2 | time 0[s] | loss 1.80
| epoch 80 |  iter 1 / 2 | time 0[s] | loss 1.82
| epoch 81 |  iter 1 / 2 | time 0[s] | loss 1.80
| epoch 82 |  iter 1 / 2 | time 0[s] | loss 1.79
| epoch 83 |  iter 1 / 2 | time 0[s] | loss 1.79
| epoch 84 |  iter 1 / 2 | time 0[s] | loss 1.80
| epoch 85 |  iter 1 / 2 | time 0[s] | loss 1.78
| epoch 86 |  iter 1 / 2 | time 0[s] | loss 1.80
| epoch 87 |  iter 1 / 2 | time 0[s] | loss 1.79
| epoch 88 |  iter 1 / 2 | time 0[s] | loss 1.76
| epoch 89 |  iter 1 / 2 | time 0[s] | loss 1.78
| epoch 90 |  iter 1 / 2 | time 0[s] | loss 1.76
| epoch 91 |  iter 1 / 2 | time 0[s] | loss 1.76
| epoch 92 |  iter 1 / 2 | time 0[s] | loss 1.76
| epoch 93 |  iter 1 / 2 | time 0[s] | loss 1.76
| epoch 94 |  iter 1 / 2 | time 0[s] | loss 1.75
| epoch 95 |  iter 1 / 2 | time 0[s] | loss 1.73
| epoch 96 |  iter 1 / 2 | time 0[s] | loss 1.74
| epoch 97 |  iter 1 / 2 | time 0[s] | loss 1.73
| epoch 98 |  iter 1 / 2 | time 0[s] | loss 1.75
| epoch 99 |  iter 1 / 2 | time 0[s] | loss 1.71
| epoch 100 |  iter 1 / 2 | time 0[s] | loss 1.75
| epoch 101 |  iter 1 / 2 | time 0[s] | loss 1.70
| epoch 102 |  iter 1 / 2 | time 0[s] | loss 1.71
| epoch 103 |  iter 1 / 2 | time 0[s] | loss 1.71
| epoch 104 |  iter 1 / 2 | time 0[s] | loss 1.70
| epoch 105 |  iter 1 / 2 | time 0[s] | loss 1.70
| epoch 106 |  iter 1 / 2 | time 0[s] | loss 1.68
| epoch 107 |  iter 1 / 2 | time 0[s] | loss 1.70
| epoch 108 |  iter 1 / 2 | time 0[s] | loss 1.69
| epoch 109 |  iter 1 / 2 | time 0[s] | loss 1.68
| epoch 110 |  iter 1 / 2 | time 0[s] | loss 1.67
| epoch 111 |  iter 1 / 2 | time 0[s] | loss 1.70
| epoch 112 |  iter 1 / 2 | time 0[s] | loss 1.63
| epoch 113 |  iter 1 / 2 | time 0[s] | loss 1.67
| epoch 114 |  iter 1 / 2 | time 0[s] | loss 1.68
| epoch 115 |  iter 1 / 2 | time 0[s] | loss 1.64
| epoch 116 |  iter 1 / 2 | time 0[s] | loss 1.64
| epoch 117 |  iter 1 / 2 | time 0[s] | loss 1.65
| epoch 118 |  iter 1 / 2 | time 0[s] | loss 1.61
| epoch 119 |  iter 1 / 2 | time 0[s] | loss 1.63
| epoch 120 |  iter 1 / 2 | time 0[s] | loss 1.65
| epoch 121 |  iter 1 / 2 | time 0[s] | loss 1.62
| epoch 122 |  iter 1 / 2 | time 0[s] | loss 1.62
| epoch 123 |  iter 1 / 2 | time 0[s] | loss 1.61
| epoch 124 |  iter 1 / 2 | time 0[s] | loss 1.58
| epoch 125 |  iter 1 / 2 | time 0[s] | loss 1.63
| epoch 126 |  iter 1 / 2 | time 0[s] | loss 1.60
| epoch 127 |  iter 1 / 2 | time 0[s] | loss 1.58
| epoch 128 |  iter 1 / 2 | time 0[s] | loss 1.59
| epoch 129 |  iter 1 / 2 | time 0[s] | loss 1.59
| epoch 130 |  iter 1 / 2 | time 0[s] | loss 1.58
| epoch 131 |  iter 1 / 2 | time 0[s] | loss 1.57
| epoch 132 |  iter 1 / 2 | time 0[s] | loss 1.58
| epoch 133 |  iter 1 / 2 | time 0[s] | loss 1.55
| epoch 134 |  iter 1 / 2 | time 0[s] | loss 1.57
| epoch 135 |  iter 1 / 2 | time 0[s] | loss 1.51
| epoch 136 |  iter 1 / 2 | time 0[s] | loss 1.60
| epoch 137 |  iter 1 / 2 | time 0[s] | loss 1.51
| epoch 138 |  iter 1 / 2 | time 0[s] | loss 1.53
| epoch 139 |  iter 1 / 2 | time 0[s] | loss 1.54
| epoch 140 |  iter 1 / 2 | time 0[s] | loss 1.52
| epoch 141 |  iter 1 / 2 | time 0[s] | loss 1.53
| epoch 142 |  iter 1 / 2 | time 0[s] | loss 1.54
| epoch 143 |  iter 1 / 2 | time 0[s] | loss 1.52
| epoch 144 |  iter 1 / 2 | time 0[s] | loss 1.46
| epoch 145 |  iter 1 / 2 | time 0[s] | loss 1.51
| epoch 146 |  iter 1 / 2 | time 0[s] | loss 1.50
| epoch 147 |  iter 1 / 2 | time 0[s] | loss 1.49
| epoch 148 |  iter 1 / 2 | time 0[s] | loss 1.49
| epoch 149 |  iter 1 / 2 | time 0[s] | loss 1.45
| epoch 150 |  iter 1 / 2 | time 0[s] | loss 1.51
| epoch 151 |  iter 1 / 2 | time 0[s] | loss 1.48
| epoch 152 |  iter 1 / 2 | time 0[s] | loss 1.46
| epoch 153 |  iter 1 / 2 | time 0[s] | loss 1.45
| epoch 154 |  iter 1 / 2 | time 0[s] | loss 1.46
| epoch 155 |  iter 1 / 2 | time 0[s] | loss 1.46
| epoch 156 |  iter 1 / 2 | time 0[s] | loss 1.46
| epoch 157 |  iter 1 / 2 | time 0[s] | loss 1.44
| epoch 158 |  iter 1 / 2 | time 0[s] | loss 1.43
| epoch 159 |  iter 1 / 2 | time 0[s] | loss 1.43
| epoch 160 |  iter 1 / 2 | time 0[s] | loss 1.43
| epoch 161 |  iter 1 / 2 | time 0[s] | loss 1.39
| epoch 162 |  iter 1 / 2 | time 0[s] | loss 1.48
| epoch 163 |  iter 1 / 2 | time 0[s] | loss 1.34
| epoch 164 |  iter 1 / 2 | time 0[s] | loss 1.41
| epoch 165 |  iter 1 / 2 | time 0[s] | loss 1.41
| epoch 166 |  iter 1 / 2 | time 0[s] | loss 1.49
| epoch 167 |  iter 1 / 2 | time 0[s] | loss 1.30
| epoch 168 |  iter 1 / 2 | time 0[s] | loss 1.47
| epoch 169 |  iter 1 / 2 | time 0[s] | loss 1.40
| epoch 170 |  iter 1 / 2 | time 0[s] | loss 1.33
| epoch 171 |  iter 1 / 2 | time 0[s] | loss 1.43
| epoch 172 |  iter 1 / 2 | time 0[s] | loss 1.28
| epoch 173 |  iter 1 / 2 | time 0[s] | loss 1.46
| epoch 174 |  iter 1 / 2 | time 0[s] | loss 1.31
| epoch 175 |  iter 1 / 2 | time 0[s] | loss 1.36
| epoch 176 |  iter 1 / 2 | time 0[s] | loss 1.36
| epoch 177 |  iter 1 / 2 | time 0[s] | loss 1.41
| epoch 178 |  iter 1 / 2 | time 0[s] | loss 1.29
| epoch 179 |  iter 1 / 2 | time 0[s] | loss 1.35
| epoch 180 |  iter 1 / 2 | time 0[s] | loss 1.33
| epoch 181 |  iter 1 / 2 | time 0[s] | loss 1.35
| epoch 182 |  iter 1 / 2 | time 0[s] | loss 1.33
| epoch 183 |  iter 1 / 2 | time 0[s] | loss 1.36
| epoch 184 |  iter 1 / 2 | time 0[s] | loss 1.23
| epoch 185 |  iter 1 / 2 | time 0[s] | loss 1.38
| epoch 186 |  iter 1 / 2 | time 0[s] | loss 1.30
| epoch 187 |  iter 1 / 2 | time 0[s] | loss 1.33
| epoch 188 |  iter 1 / 2 | time 0[s] | loss 1.35
| epoch 189 |  iter 1 / 2 | time 0[s] | loss 1.30
| epoch 190 |  iter 1 / 2 | time 0[s] | loss 1.19
| epoch 191 |  iter 1 / 2 | time 0[s] | loss 1.34
| epoch 192 |  iter 1 / 2 | time 0[s] | loss 1.35
| epoch 193 |  iter 1 / 2 | time 0[s] | loss 1.17
| epoch 194 |  iter 1 / 2 | time 0[s] | loss 1.34
| epoch 195 |  iter 1 / 2 | time 0[s] | loss 1.28
| epoch 196 |  iter 1 / 2 | time 0[s] | loss 1.27
| epoch 197 |  iter 1 / 2 | time 0[s] | loss 1.27
| epoch 198 |  iter 1 / 2 | time 0[s] | loss 1.21
| epoch 199 |  iter 1 / 2 | time 0[s] | loss 1.32
| epoch 200 |  iter 1 / 2 | time 0[s] | loss 1.20
| epoch 201 |  iter 1 / 2 | time 0[s] | loss 1.32
| epoch 202 |  iter 1 / 2 | time 0[s] | loss 1.24
| epoch 203 |  iter 1 / 2 | time 0[s] | loss 1.31
| epoch 204 |  iter 1 / 2 | time 0[s] | loss 1.24
| epoch 205 |  iter 1 / 2 | time 0[s] | loss 1.13
| epoch 206 |  iter 1 / 2 | time 0[s] | loss 1.28
| epoch 207 |  iter 1 / 2 | time 0[s] | loss 1.18
| epoch 208 |  iter 1 / 2 | time 0[s] | loss 1.24
| epoch 209 |  iter 1 / 2 | time 0[s] | loss 1.27
| epoch 210 |  iter 1 / 2 | time 0[s] | loss 1.16
| epoch 211 |  iter 1 / 2 | time 0[s] | loss 1.28
| epoch 212 |  iter 1 / 2 | time 0[s] | loss 1.22
| epoch 213 |  iter 1 / 2 | time 0[s] | loss 1.21
| epoch 214 |  iter 1 / 2 | time 0[s] | loss 1.21
| epoch 215 |  iter 1 / 2 | time 0[s] | loss 1.15
| epoch 216 |  iter 1 / 2 | time 0[s] | loss 1.26
| epoch 217 |  iter 1 / 2 | time 0[s] | loss 1.19
| epoch 218 |  iter 1 / 2 | time 0[s] | loss 1.20
| epoch 219 |  iter 1 / 2 | time 0[s] | loss 1.19
| epoch 220 |  iter 1 / 2 | time 0[s] | loss 1.12
| epoch 221 |  iter 1 / 2 | time 0[s] | loss 1.26
| epoch 222 |  iter 1 / 2 | time 0[s] | loss 1.18
| epoch 223 |  iter 1 / 2 | time 0[s] | loss 1.18
| epoch 224 |  iter 1 / 2 | time 0[s] | loss 1.24
| epoch 225 |  iter 1 / 2 | time 0[s] | loss 1.04
| epoch 226 |  iter 1 / 2 | time 0[s] | loss 1.23
| epoch 227 |  iter 1 / 2 | time 0[s] | loss 1.10
| epoch 228 |  iter 1 / 2 | time 0[s] | loss 1.30
| epoch 229 |  iter 1 / 2 | time 0[s] | loss 1.09
| epoch 230 |  iter 1 / 2 | time 0[s] | loss 1.16
| epoch 231 |  iter 1 / 2 | time 0[s] | loss 1.15
| epoch 232 |  iter 1 / 2 | time 0[s] | loss 1.08
| epoch 233 |  iter 1 / 2 | time 0[s] | loss 1.22
| epoch 234 |  iter 1 / 2 | time 0[s] | loss 1.14
| epoch 235 |  iter 1 / 2 | time 0[s] | loss 1.08
| epoch 236 |  iter 1 / 2 | time 0[s] | loss 1.14
| epoch 237 |  iter 1 / 2 | time 0[s] | loss 1.21
| epoch 238 |  iter 1 / 2 | time 0[s] | loss 1.05
| epoch 239 |  iter 1 / 2 | time 0[s] | loss 1.27
| epoch 240 |  iter 1 / 2 | time 0[s] | loss 1.06
| epoch 241 |  iter 1 / 2 | time 0[s] | loss 1.05
| epoch 242 |  iter 1 / 2 | time 0[s] | loss 1.27
| epoch 243 |  iter 1 / 2 | time 0[s] | loss 1.04
| epoch 244 |  iter 1 / 2 | time 0[s] | loss 1.12
| epoch 245 |  iter 1 / 2 | time 0[s] | loss 1.11
| epoch 246 |  iter 1 / 2 | time 0[s] | loss 1.12
| epoch 247 |  iter 1 / 2 | time 0[s] | loss 1.02
| epoch 248 |  iter 1 / 2 | time 0[s] | loss 1.18
| epoch 249 |  iter 1 / 2 | time 0[s] | loss 1.11
| epoch 250 |  iter 1 / 2 | time 0[s] | loss 1.02
| epoch 251 |  iter 1 / 2 | time 0[s] | loss 1.17
| epoch 252 |  iter 1 / 2 | time 0[s] | loss 1.02
| epoch 253 |  iter 1 / 2 | time 0[s] | loss 1.24
| epoch 254 |  iter 1 / 2 | time 0[s] | loss 1.00
| epoch 255 |  iter 1 / 2 | time 0[s] | loss 1.09
| epoch 256 |  iter 1 / 2 | time 0[s] | loss 1.16
| epoch 257 |  iter 1 / 2 | time 0[s] | loss 0.99
| epoch 258 |  iter 1 / 2 | time 0[s] | loss 1.00
| epoch 259 |  iter 1 / 2 | time 0[s] | loss 1.17
| epoch 260 |  iter 1 / 2 | time 0[s] | loss 1.07
| epoch 261 |  iter 1 / 2 | time 0[s] | loss 1.07
| epoch 262 |  iter 1 / 2 | time 0[s] | loss 1.00
| epoch 263 |  iter 1 / 2 | time 0[s] | loss 1.21
| epoch 264 |  iter 1 / 2 | time 0[s] | loss 0.97
| epoch 265 |  iter 1 / 2 | time 0[s] | loss 1.08
| epoch 266 |  iter 1 / 2 | time 0[s] | loss 0.98
| epoch 267 |  iter 1 / 2 | time 0[s] | loss 1.11
| epoch 268 |  iter 1 / 2 | time 0[s] | loss 1.15
| epoch 269 |  iter 1 / 2 | time 0[s] | loss 0.98
| epoch 270 |  iter 1 / 2 | time 0[s] | loss 1.04
| epoch 271 |  iter 1 / 2 | time 0[s] | loss 1.05
| epoch 272 |  iter 1 / 2 | time 0[s] | loss 1.04
| epoch 273 |  iter 1 / 2 | time 0[s] | loss 1.04
| epoch 274 |  iter 1 / 2 | time 0[s] | loss 1.06
| epoch 275 |  iter 1 / 2 | time 0[s] | loss 1.04
| epoch 276 |  iter 1 / 2 | time 0[s] | loss 1.09
| epoch 277 |  iter 1 / 2 | time 0[s] | loss 0.95
| epoch 278 |  iter 1 / 2 | time 0[s] | loss 0.96
| epoch 279 |  iter 1 / 2 | time 0[s] | loss 1.10
| epoch 280 |  iter 1 / 2 | time 0[s] | loss 1.05
| epoch 281 |  iter 1 / 2 | time 0[s] | loss 1.02
| epoch 282 |  iter 1 / 2 | time 0[s] | loss 1.02
| epoch 283 |  iter 1 / 2 | time 0[s] | loss 0.91
| epoch 284 |  iter 1 / 2 | time 0[s] | loss 1.13
| epoch 285 |  iter 1 / 2 | time 0[s] | loss 0.98
| epoch 286 |  iter 1 / 2 | time 0[s] | loss 1.01
| epoch 287 |  iter 1 / 2 | time 0[s] | loss 0.93
| epoch 288 |  iter 1 / 2 | time 0[s] | loss 1.12
| epoch 289 |  iter 1 / 2 | time 0[s] | loss 0.97
| epoch 290 |  iter 1 / 2 | time 0[s] | loss 1.03
| epoch 291 |  iter 1 / 2 | time 0[s] | loss 1.00
| epoch 292 |  iter 1 / 2 | time 0[s] | loss 0.98
| epoch 293 |  iter 1 / 2 | time 0[s] | loss 1.12
| epoch 294 |  iter 1 / 2 | time 0[s] | loss 0.88
| epoch 295 |  iter 1 / 2 | time 0[s] | loss 1.11
| epoch 296 |  iter 1 / 2 | time 0[s] | loss 0.89
| epoch 297 |  iter 1 / 2 | time 0[s] | loss 0.91
| epoch 298 |  iter 1 / 2 | time 0[s] | loss 1.07
| epoch 299 |  iter 1 / 2 | time 0[s] | loss 1.09
| epoch 300 |  iter 1 / 2 | time 0[s] | loss 0.82
| epoch 301 |  iter 1 / 2 | time 0[s] | loss 1.04
| epoch 302 |  iter 1 / 2 | time 0[s] | loss 0.98
| epoch 303 |  iter 1 / 2 | time 0[s] | loss 1.07
| epoch 304 |  iter 1 / 2 | time 0[s] | loss 0.95
| epoch 305 |  iter 1 / 2 | time 0[s] | loss 0.92
| epoch 306 |  iter 1 / 2 | time 0[s] | loss 0.99
| epoch 307 |  iter 1 / 2 | time 0[s] | loss 0.95
| epoch 308 |  iter 1 / 2 | time 0[s] | loss 0.88
| epoch 309 |  iter 1 / 2 | time 0[s] | loss 1.08
| epoch 310 |  iter 1 / 2 | time 0[s] | loss 1.03
| epoch 311 |  iter 1 / 2 | time 0[s] | loss 0.85
| epoch 312 |  iter 1 / 2 | time 0[s] | loss 0.88
| epoch 313 |  iter 1 / 2 | time 0[s] | loss 1.13
| epoch 314 |  iter 1 / 2 | time 0[s] | loss 0.90
| epoch 315 |  iter 1 / 2 | time 0[s] | loss 0.93
| epoch 316 |  iter 1 / 2 | time 0[s] | loss 1.01
| epoch 317 |  iter 1 / 2 | time 0[s] | loss 0.87
| epoch 318 |  iter 1 / 2 | time 0[s] | loss 1.01
| epoch 319 |  iter 1 / 2 | time 0[s] | loss 0.97
| epoch 320 |  iter 1 / 2 | time 0[s] | loss 0.95
| epoch 321 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 322 |  iter 1 / 2 | time 0[s] | loss 1.08
| epoch 323 |  iter 1 / 2 | time 0[s] | loss 0.92
| epoch 324 |  iter 1 / 2 | time 0[s] | loss 0.83
| epoch 325 |  iter 1 / 2 | time 0[s] | loss 0.96
| epoch 326 |  iter 1 / 2 | time 0[s] | loss 1.11
| epoch 327 |  iter 1 / 2 | time 0[s] | loss 0.94
| epoch 328 |  iter 1 / 2 | time 0[s] | loss 0.85
| epoch 329 |  iter 1 / 2 | time 0[s] | loss 0.88
| epoch 330 |  iter 1 / 2 | time 0[s] | loss 1.02
| epoch 331 |  iter 1 / 2 | time 0[s] | loss 0.93
| epoch 332 |  iter 1 / 2 | time 0[s] | loss 0.90
| epoch 333 |  iter 1 / 2 | time 0[s] | loss 0.84
| epoch 334 |  iter 1 / 2 | time 0[s] | loss 0.99
| epoch 335 |  iter 1 / 2 | time 0[s] | loss 0.95
| epoch 336 |  iter 1 / 2 | time 0[s] | loss 0.95
| epoch 337 |  iter 1 / 2 | time 0[s] | loss 0.77
| epoch 338 |  iter 1 / 2 | time 0[s] | loss 1.04
| epoch 339 |  iter 1 / 2 | time 0[s] | loss 0.83
| epoch 340 |  iter 1 / 2 | time 0[s] | loss 0.92
| epoch 341 |  iter 1 / 2 | time 0[s] | loss 0.91
| epoch 342 |  iter 1 / 2 | time 0[s] | loss 1.00
| epoch 343 |  iter 1 / 2 | time 0[s] | loss 0.83
| epoch 344 |  iter 1 / 2 | time 0[s] | loss 1.00
| epoch 345 |  iter 1 / 2 | time 0[s] | loss 0.91
| epoch 346 |  iter 1 / 2 | time 0[s] | loss 0.91
| epoch 347 |  iter 1 / 2 | time 0[s] | loss 0.94
| epoch 348 |  iter 1 / 2 | time 0[s] | loss 0.99
| epoch 349 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 350 |  iter 1 / 2 | time 0[s] | loss 0.84
| epoch 351 |  iter 1 / 2 | time 0[s] | loss 1.05
| epoch 352 |  iter 1 / 2 | time 0[s] | loss 0.83
| epoch 353 |  iter 1 / 2 | time 0[s] | loss 0.93
| epoch 354 |  iter 1 / 2 | time 0[s] | loss 0.84
| epoch 355 |  iter 1 / 2 | time 0[s] | loss 0.82
| epoch 356 |  iter 1 / 2 | time 0[s] | loss 1.05
| epoch 357 |  iter 1 / 2 | time 0[s] | loss 0.80
| epoch 358 |  iter 1 / 2 | time 0[s] | loss 0.94
| epoch 359 |  iter 1 / 2 | time 0[s] | loss 0.76
| epoch 360 |  iter 1 / 2 | time 0[s] | loss 0.92
| epoch 361 |  iter 1 / 2 | time 0[s] | loss 0.97
| epoch 362 |  iter 1 / 2 | time 0[s] | loss 0.85
| epoch 363 |  iter 1 / 2 | time 0[s] | loss 0.75
| epoch 364 |  iter 1 / 2 | time 0[s] | loss 1.02
| epoch 365 |  iter 1 / 2 | time 0[s] | loss 0.70
| epoch 366 |  iter 1 / 2 | time 0[s] | loss 1.09
| epoch 367 |  iter 1 / 2 | time 0[s] | loss 0.75
| epoch 368 |  iter 1 / 2 | time 0[s] | loss 0.95
| epoch 369 |  iter 1 / 2 | time 0[s] | loss 0.83
| epoch 370 |  iter 1 / 2 | time 0[s] | loss 0.79
| epoch 371 |  iter 1 / 2 | time 0[s] | loss 1.00
| epoch 372 |  iter 1 / 2 | time 0[s] | loss 0.83
| epoch 373 |  iter 1 / 2 | time 0[s] | loss 0.95
| epoch 374 |  iter 1 / 2 | time 0[s] | loss 0.79
| epoch 375 |  iter 1 / 2 | time 0[s] | loss 0.82
| epoch 376 |  iter 1 / 2 | time 0[s] | loss 0.95
| epoch 377 |  iter 1 / 2 | time 0[s] | loss 0.85
| epoch 378 |  iter 1 / 2 | time 0[s] | loss 0.82
| epoch 379 |  iter 1 / 2 | time 0[s] | loss 0.82
| epoch 380 |  iter 1 / 2 | time 0[s] | loss 0.94
| epoch 381 |  iter 1 / 2 | time 0[s] | loss 0.73
| epoch 382 |  iter 1 / 2 | time 0[s] | loss 0.94
| epoch 383 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 384 |  iter 1 / 2 | time 0[s] | loss 1.03
| epoch 385 |  iter 1 / 2 | time 0[s] | loss 0.73
| epoch 386 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 387 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 388 |  iter 1 / 2 | time 0[s] | loss 0.98
| epoch 389 |  iter 1 / 2 | time 0[s] | loss 0.80
| epoch 390 |  iter 1 / 2 | time 0[s] | loss 0.86
| epoch 391 |  iter 1 / 2 | time 0[s] | loss 0.88
| epoch 392 |  iter 1 / 2 | time 0[s] | loss 0.72
| epoch 393 |  iter 1 / 2 | time 0[s] | loss 0.97
| epoch 394 |  iter 1 / 2 | time 0[s] | loss 0.84
| epoch 395 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 396 |  iter 1 / 2 | time 0[s] | loss 1.02
| epoch 397 |  iter 1 / 2 | time 0[s] | loss 0.84
| epoch 398 |  iter 1 / 2 | time 0[s] | loss 0.71
| epoch 399 |  iter 1 / 2 | time 0[s] | loss 1.00
| epoch 400 |  iter 1 / 2 | time 0[s] | loss 0.71
| epoch 401 |  iter 1 / 2 | time 0[s] | loss 0.96
| epoch 402 |  iter 1 / 2 | time 0[s] | loss 0.62
| epoch 403 |  iter 1 / 2 | time 0[s] | loss 1.01
| epoch 404 |  iter 1 / 2 | time 0[s] | loss 0.78
| epoch 405 |  iter 1 / 2 | time 0[s] | loss 0.79
| epoch 406 |  iter 1 / 2 | time 0[s] | loss 0.92
| epoch 407 |  iter 1 / 2 | time 0[s] | loss 0.83
| epoch 408 |  iter 1 / 2 | time 0[s] | loss 0.86
| epoch 409 |  iter 1 / 2 | time 0[s] | loss 0.83
| epoch 410 |  iter 1 / 2 | time 0[s] | loss 0.79
| epoch 411 |  iter 1 / 2 | time 0[s] | loss 0.82
| epoch 412 |  iter 1 / 2 | time 0[s] | loss 0.86
| epoch 413 |  iter 1 / 2 | time 0[s] | loss 0.82
| epoch 414 |  iter 1 / 2 | time 0[s] | loss 0.70
| epoch 415 |  iter 1 / 2 | time 0[s] | loss 0.78
| epoch 416 |  iter 1 / 2 | time 0[s] | loss 0.86
| epoch 417 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 418 |  iter 1 / 2 | time 0[s] | loss 0.87
| epoch 419 |  iter 1 / 2 | time 0[s] | loss 0.89
| epoch 420 |  iter 1 / 2 | time 0[s] | loss 0.68
| epoch 421 |  iter 1 / 2 | time 0[s] | loss 0.94
| epoch 422 |  iter 1 / 2 | time 0[s] | loss 0.72
| epoch 423 |  iter 1 / 2 | time 0[s] | loss 0.87
| epoch 424 |  iter 1 / 2 | time 0[s] | loss 0.71
| epoch 425 |  iter 1 / 2 | time 0[s] | loss 0.82
| epoch 426 |  iter 1 / 2 | time 0[s] | loss 0.79
| epoch 427 |  iter 1 / 2 | time 0[s] | loss 0.78
| epoch 428 |  iter 1 / 2 | time 0[s] | loss 0.89
| epoch 429 |  iter 1 / 2 | time 0[s] | loss 0.74
| epoch 430 |  iter 1 / 2 | time 0[s] | loss 0.80
| epoch 431 |  iter 1 / 2 | time 0[s] | loss 0.93
| epoch 432 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 433 |  iter 1 / 2 | time 0[s] | loss 0.93
| epoch 434 |  iter 1 / 2 | time 0[s] | loss 0.77
| epoch 435 |  iter 1 / 2 | time 0[s] | loss 0.69
| epoch 436 |  iter 1 / 2 | time 0[s] | loss 0.80
| epoch 437 |  iter 1 / 2 | time 0[s] | loss 0.86
| epoch 438 |  iter 1 / 2 | time 0[s] | loss 0.86
| epoch 439 |  iter 1 / 2 | time 0[s] | loss 0.64
| epoch 440 |  iter 1 / 2 | time 0[s] | loss 1.01
| epoch 441 |  iter 1 / 2 | time 0[s] | loss 0.57
| epoch 442 |  iter 1 / 2 | time 0[s] | loss 0.73
| epoch 443 |  iter 1 / 2 | time 0[s] | loss 0.88
| epoch 444 |  iter 1 / 2 | time 0[s] | loss 0.79
| epoch 445 |  iter 1 / 2 | time 0[s] | loss 0.76
| epoch 446 |  iter 1 / 2 | time 0[s] | loss 0.92
| epoch 447 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 448 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 449 |  iter 1 / 2 | time 0[s] | loss 0.83
| epoch 450 |  iter 1 / 2 | time 0[s] | loss 0.69
| epoch 451 |  iter 1 / 2 | time 0[s] | loss 0.91
| epoch 452 |  iter 1 / 2 | time 0[s] | loss 0.74
| epoch 453 |  iter 1 / 2 | time 0[s] | loss 0.82
| epoch 454 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 455 |  iter 1 / 2 | time 0[s] | loss 0.78
| epoch 456 |  iter 1 / 2 | time 0[s] | loss 0.80
| epoch 457 |  iter 1 / 2 | time 0[s] | loss 0.79
| epoch 458 |  iter 1 / 2 | time 0[s] | loss 0.73
| epoch 459 |  iter 1 / 2 | time 0[s] | loss 0.88
| epoch 460 |  iter 1 / 2 | time 0[s] | loss 0.64
| epoch 461 |  iter 1 / 2 | time 0[s] | loss 0.90
| epoch 462 |  iter 1 / 2 | time 0[s] | loss 0.64
| epoch 463 |  iter 1 / 2 | time 0[s] | loss 0.79
| epoch 464 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 465 |  iter 1 / 2 | time 0[s] | loss 0.71
| epoch 466 |  iter 1 / 2 | time 0[s] | loss 0.85
| epoch 467 |  iter 1 / 2 | time 0[s] | loss 0.61
| epoch 468 |  iter 1 / 2 | time 0[s] | loss 0.85
| epoch 469 |  iter 1 / 2 | time 0[s] | loss 0.76
| epoch 470 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 471 |  iter 1 / 2 | time 0[s] | loss 0.79
| epoch 472 |  iter 1 / 2 | time 0[s] | loss 0.82
| epoch 473 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 474 |  iter 1 / 2 | time 0[s] | loss 0.79
| epoch 475 |  iter 1 / 2 | time 0[s] | loss 0.76
| epoch 476 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 477 |  iter 1 / 2 | time 0[s] | loss 0.63
| epoch 478 |  iter 1 / 2 | time 0[s] | loss 0.87
| epoch 479 |  iter 1 / 2 | time 0[s] | loss 0.64
| epoch 480 |  iter 1 / 2 | time 0[s] | loss 0.96
| epoch 481 |  iter 1 / 2 | time 0[s] | loss 0.55
| epoch 482 |  iter 1 / 2 | time 0[s] | loss 0.79
| epoch 483 |  iter 1 / 2 | time 0[s] | loss 0.84
| epoch 484 |  iter 1 / 2 | time 0[s] | loss 0.66
| epoch 485 |  iter 1 / 2 | time 0[s] | loss 0.78
| epoch 486 |  iter 1 / 2 | time 0[s] | loss 0.79
| epoch 487 |  iter 1 / 2 | time 0[s] | loss 0.62
| epoch 488 |  iter 1 / 2 | time 0[s] | loss 0.83
| epoch 489 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 490 |  iter 1 / 2 | time 0[s] | loss 0.95
| epoch 491 |  iter 1 / 2 | time 0[s] | loss 0.45
| epoch 492 |  iter 1 / 2 | time 0[s] | loss 0.96
| epoch 493 |  iter 1 / 2 | time 0[s] | loss 0.61
| epoch 494 |  iter 1 / 2 | time 0[s] | loss 0.83
| epoch 495 |  iter 1 / 2 | time 0[s] | loss 0.70
| epoch 496 |  iter 1 / 2 | time 0[s] | loss 0.61
| epoch 497 |  iter 1 / 2 | time 0[s] | loss 0.96
| epoch 498 |  iter 1 / 2 | time 0[s] | loss 0.70
| epoch 499 |  iter 1 / 2 | time 0[s] | loss 0.65
| epoch 500 |  iter 1 / 2 | time 0[s] | loss 0.73
| epoch 501 |  iter 1 / 2 | time 0[s] | loss 0.87
| epoch 502 |  iter 1 / 2 | time 0[s] | loss 0.69
| epoch 503 |  iter 1 / 2 | time 0[s] | loss 0.74
| epoch 504 |  iter 1 / 2 | time 0[s] | loss 0.65
| epoch 505 |  iter 1 / 2 | time 0[s] | loss 0.82
| epoch 506 |  iter 1 / 2 | time 0[s] | loss 0.64
| epoch 507 |  iter 1 / 2 | time 0[s] | loss 0.87
| epoch 508 |  iter 1 / 2 | time 0[s] | loss 0.72
| epoch 509 |  iter 1 / 2 | time 0[s] | loss 0.82
| epoch 510 |  iter 1 / 2 | time 0[s] | loss 0.52
| epoch 511 |  iter 1 / 2 | time 0[s] | loss 0.85
| epoch 512 |  iter 1 / 2 | time 0[s] | loss 0.65
| epoch 513 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 514 |  iter 1 / 2 | time 0[s] | loss 0.68
| epoch 515 |  iter 1 / 2 | time 0[s] | loss 0.77
| epoch 516 |  iter 1 / 2 | time 0[s] | loss 0.68
| epoch 517 |  iter 1 / 2 | time 0[s] | loss 0.64
| epoch 518 |  iter 1 / 2 | time 0[s] | loss 0.76
| epoch 519 |  iter 1 / 2 | time 0[s] | loss 0.80
| epoch 520 |  iter 1 / 2 | time 0[s] | loss 0.60
| epoch 521 |  iter 1 / 2 | time 0[s] | loss 0.80
| epoch 522 |  iter 1 / 2 | time 0[s] | loss 0.68
| epoch 523 |  iter 1 / 2 | time 0[s] | loss 0.72
| epoch 524 |  iter 1 / 2 | time 0[s] | loss 0.88
| epoch 525 |  iter 1 / 2 | time 0[s] | loss 0.51
| epoch 526 |  iter 1 / 2 | time 0[s] | loss 0.84
| epoch 527 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 528 |  iter 1 / 2 | time 0[s] | loss 0.76
| epoch 529 |  iter 1 / 2 | time 0[s] | loss 0.71
| epoch 530 |  iter 1 / 2 | time 0[s] | loss 0.71
| epoch 531 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 532 |  iter 1 / 2 | time 0[s] | loss 0.63
| epoch 533 |  iter 1 / 2 | time 0[s] | loss 0.71
| epoch 534 |  iter 1 / 2 | time 0[s] | loss 0.71
| epoch 535 |  iter 1 / 2 | time 0[s] | loss 0.83
| epoch 536 |  iter 1 / 2 | time 0[s] | loss 0.58
| epoch 537 |  iter 1 / 2 | time 0[s] | loss 0.83
| epoch 538 |  iter 1 / 2 | time 0[s] | loss 0.66
| epoch 539 |  iter 1 / 2 | time 0[s] | loss 0.75
| epoch 540 |  iter 1 / 2 | time 0[s] | loss 0.70
| epoch 541 |  iter 1 / 2 | time 0[s] | loss 0.62
| epoch 542 |  iter 1 / 2 | time 0[s] | loss 0.70
| epoch 543 |  iter 1 / 2 | time 0[s] | loss 0.66
| epoch 544 |  iter 1 / 2 | time 0[s] | loss 0.70
| epoch 545 |  iter 1 / 2 | time 0[s] | loss 0.74
| epoch 546 |  iter 1 / 2 | time 0[s] | loss 0.87
| epoch 547 |  iter 1 / 2 | time 0[s] | loss 0.49
| epoch 548 |  iter 1 / 2 | time 0[s] | loss 0.91
| epoch 549 |  iter 1 / 2 | time 0[s] | loss 0.49
| epoch 550 |  iter 1 / 2 | time 0[s] | loss 0.61
| epoch 551 |  iter 1 / 2 | time 0[s] | loss 0.78
| epoch 552 |  iter 1 / 2 | time 0[s] | loss 0.82
| epoch 553 |  iter 1 / 2 | time 0[s] | loss 0.61
| epoch 554 |  iter 1 / 2 | time 0[s] | loss 0.69
| epoch 555 |  iter 1 / 2 | time 0[s] | loss 0.78
| epoch 556 |  iter 1 / 2 | time 0[s] | loss 0.48
| epoch 557 |  iter 1 / 2 | time 0[s] | loss 0.99
| epoch 558 |  iter 1 / 2 | time 0[s] | loss 0.61
| epoch 559 |  iter 1 / 2 | time 0[s] | loss 0.65
| epoch 560 |  iter 1 / 2 | time 0[s] | loss 0.69
| epoch 561 |  iter 1 / 2 | time 0[s] | loss 0.60
| epoch 562 |  iter 1 / 2 | time 0[s] | loss 0.68
| epoch 563 |  iter 1 / 2 | time 0[s] | loss 0.73
| epoch 564 |  iter 1 / 2 | time 0[s] | loss 0.78
| epoch 565 |  iter 1 / 2 | time 0[s] | loss 0.69
| epoch 566 |  iter 1 / 2 | time 0[s] | loss 0.60
| epoch 567 |  iter 1 / 2 | time 0[s] | loss 0.68
| epoch 568 |  iter 1 / 2 | time 0[s] | loss 0.77
| epoch 569 |  iter 1 / 2 | time 0[s] | loss 0.64
| epoch 570 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 571 |  iter 1 / 2 | time 0[s] | loss 0.55
| epoch 572 |  iter 1 / 2 | time 0[s] | loss 0.64
| epoch 573 |  iter 1 / 2 | time 0[s] | loss 0.64
| epoch 574 |  iter 1 / 2 | time 0[s] | loss 0.80
| epoch 575 |  iter 1 / 2 | time 0[s] | loss 0.59
| epoch 576 |  iter 1 / 2 | time 0[s] | loss 0.68
| epoch 577 |  iter 1 / 2 | time 0[s] | loss 0.63
| epoch 578 |  iter 1 / 2 | time 0[s] | loss 0.68
| epoch 579 |  iter 1 / 2 | time 0[s] | loss 0.68
| epoch 580 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 581 |  iter 1 / 2 | time 0[s] | loss 0.55
| epoch 582 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 583 |  iter 1 / 2 | time 0[s] | loss 0.59
| epoch 584 |  iter 1 / 2 | time 0[s] | loss 0.80
| epoch 585 |  iter 1 / 2 | time 0[s] | loss 0.55
| epoch 586 |  iter 1 / 2 | time 0[s] | loss 0.89
| epoch 587 |  iter 1 / 2 | time 0[s] | loss 0.75
| epoch 588 |  iter 1 / 2 | time 0[s] | loss 0.46
| epoch 589 |  iter 1 / 2 | time 0[s] | loss 0.79
| epoch 590 |  iter 1 / 2 | time 0[s] | loss 0.64
| epoch 591 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 592 |  iter 1 / 2 | time 0[s] | loss 0.57
| epoch 593 |  iter 1 / 2 | time 0[s] | loss 0.76
| epoch 594 |  iter 1 / 2 | time 0[s] | loss 0.70
| epoch 595 |  iter 1 / 2 | time 0[s] | loss 0.75
| epoch 596 |  iter 1 / 2 | time 0[s] | loss 0.66
| epoch 597 |  iter 1 / 2 | time 0[s] | loss 0.49
| epoch 598 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 599 |  iter 1 / 2 | time 0[s] | loss 0.70
| epoch 600 |  iter 1 / 2 | time 0[s] | loss 0.59
| epoch 601 |  iter 1 / 2 | time 0[s] | loss 0.69
| epoch 602 |  iter 1 / 2 | time 0[s] | loss 0.75
| epoch 603 |  iter 1 / 2 | time 0[s] | loss 0.54
| epoch 604 |  iter 1 / 2 | time 0[s] | loss 0.69
| epoch 605 |  iter 1 / 2 | time 0[s] | loss 0.63
| epoch 606 |  iter 1 / 2 | time 0[s] | loss 0.56
| epoch 607 |  iter 1 / 2 | time 0[s] | loss 0.75
| epoch 608 |  iter 1 / 2 | time 0[s] | loss 0.66
| epoch 609 |  iter 1 / 2 | time 0[s] | loss 0.73
| epoch 610 |  iter 1 / 2 | time 0[s] | loss 0.49
| epoch 611 |  iter 1 / 2 | time 0[s] | loss 0.86
| epoch 612 |  iter 1 / 2 | time 0[s] | loss 0.53
| epoch 613 |  iter 1 / 2 | time 0[s] | loss 0.78
| epoch 614 |  iter 1 / 2 | time 0[s] | loss 0.65
| epoch 615 |  iter 1 / 2 | time 0[s] | loss 0.65
| epoch 616 |  iter 1 / 2 | time 0[s] | loss 0.65
| epoch 617 |  iter 1 / 2 | time 0[s] | loss 0.56
| epoch 618 |  iter 1 / 2 | time 0[s] | loss 0.74
| epoch 619 |  iter 1 / 2 | time 0[s] | loss 0.52
| epoch 620 |  iter 1 / 2 | time 0[s] | loss 0.56
| epoch 621 |  iter 1 / 2 | time 0[s] | loss 0.74
| epoch 622 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 623 |  iter 1 / 2 | time 0[s] | loss 0.65
| epoch 624 |  iter 1 / 2 | time 0[s] | loss 0.74
| epoch 625 |  iter 1 / 2 | time 0[s] | loss 0.60
| epoch 626 |  iter 1 / 2 | time 0[s] | loss 0.68
| epoch 627 |  iter 1 / 2 | time 0[s] | loss 0.57
| epoch 628 |  iter 1 / 2 | time 0[s] | loss 0.61
| epoch 629 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 630 |  iter 1 / 2 | time 0[s] | loss 0.72
| epoch 631 |  iter 1 / 2 | time 0[s] | loss 0.57
| epoch 632 |  iter 1 / 2 | time 0[s] | loss 0.59
| epoch 633 |  iter 1 / 2 | time 0[s] | loss 0.74
| epoch 634 |  iter 1 / 2 | time 0[s] | loss 0.76
| epoch 635 |  iter 1 / 2 | time 0[s] | loss 0.34
| epoch 636 |  iter 1 / 2 | time 0[s] | loss 0.72
| epoch 637 |  iter 1 / 2 | time 0[s] | loss 0.64
| epoch 638 |  iter 1 / 2 | time 0[s] | loss 0.65
| epoch 639 |  iter 1 / 2 | time 0[s] | loss 0.72
| epoch 640 |  iter 1 / 2 | time 0[s] | loss 0.57
| epoch 641 |  iter 1 / 2 | time 0[s] | loss 0.73
| epoch 642 |  iter 1 / 2 | time 0[s] | loss 0.63
| epoch 643 |  iter 1 / 2 | time 0[s] | loss 0.52
| epoch 644 |  iter 1 / 2 | time 0[s] | loss 0.75
| epoch 645 |  iter 1 / 2 | time 0[s] | loss 0.63
| epoch 646 |  iter 1 / 2 | time 0[s] | loss 0.63
| epoch 647 |  iter 1 / 2 | time 0[s] | loss 0.62
| epoch 648 |  iter 1 / 2 | time 0[s] | loss 0.60
| epoch 649 |  iter 1 / 2 | time 0[s] | loss 0.63
| epoch 650 |  iter 1 / 2 | time 0[s] | loss 0.55
| epoch 651 |  iter 1 / 2 | time 0[s] | loss 0.63
| epoch 652 |  iter 1 / 2 | time 0[s] | loss 0.76
| epoch 653 |  iter 1 / 2 | time 0[s] | loss 0.41
| epoch 654 |  iter 1 / 2 | time 0[s] | loss 0.81
| epoch 655 |  iter 1 / 2 | time 0[s] | loss 0.66
| epoch 656 |  iter 1 / 2 | time 0[s] | loss 0.62
| epoch 657 |  iter 1 / 2 | time 0[s] | loss 0.49
| epoch 658 |  iter 1 / 2 | time 0[s] | loss 0.66
| epoch 659 |  iter 1 / 2 | time 0[s] | loss 0.67
| epoch 660 |  iter 1 / 2 | time 1[s] | loss 0.54
| epoch 661 |  iter 1 / 2 | time 1[s] | loss 0.65
| epoch 662 |  iter 1 / 2 | time 1[s] | loss 0.62
| epoch 663 |  iter 1 / 2 | time 1[s] | loss 0.69
| epoch 664 |  iter 1 / 2 | time 1[s] | loss 0.65
| epoch 665 |  iter 1 / 2 | time 1[s] | loss 0.62
| epoch 666 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 667 |  iter 1 / 2 | time 1[s] | loss 0.65
| epoch 668 |  iter 1 / 2 | time 1[s] | loss 0.57
| epoch 669 |  iter 1 / 2 | time 1[s] | loss 0.73
| epoch 670 |  iter 1 / 2 | time 1[s] | loss 0.60
| epoch 671 |  iter 1 / 2 | time 1[s] | loss 0.61
| epoch 672 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 673 |  iter 1 / 2 | time 1[s] | loss 0.61
| epoch 674 |  iter 1 / 2 | time 1[s] | loss 0.71
| epoch 675 |  iter 1 / 2 | time 1[s] | loss 0.40
| epoch 676 |  iter 1 / 2 | time 1[s] | loss 0.69
| epoch 677 |  iter 1 / 2 | time 1[s] | loss 0.53
| epoch 678 |  iter 1 / 2 | time 1[s] | loss 0.82
| epoch 679 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 680 |  iter 1 / 2 | time 1[s] | loss 0.51
| epoch 681 |  iter 1 / 2 | time 1[s] | loss 0.72
| epoch 682 |  iter 1 / 2 | time 1[s] | loss 0.52
| epoch 683 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 684 |  iter 1 / 2 | time 1[s] | loss 0.68
| epoch 685 |  iter 1 / 2 | time 1[s] | loss 0.51
| epoch 686 |  iter 1 / 2 | time 1[s] | loss 0.62
| epoch 687 |  iter 1 / 2 | time 1[s] | loss 0.70
| epoch 688 |  iter 1 / 2 | time 1[s] | loss 0.62
| epoch 689 |  iter 1 / 2 | time 1[s] | loss 0.49
| epoch 690 |  iter 1 / 2 | time 1[s] | loss 0.63
| epoch 691 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 692 |  iter 1 / 2 | time 1[s] | loss 0.71
| epoch 693 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 694 |  iter 1 / 2 | time 1[s] | loss 0.60
| epoch 695 |  iter 1 / 2 | time 1[s] | loss 0.42
| epoch 696 |  iter 1 / 2 | time 1[s] | loss 0.73
| epoch 697 |  iter 1 / 2 | time 1[s] | loss 0.68
| epoch 698 |  iter 1 / 2 | time 1[s] | loss 0.39
| epoch 699 |  iter 1 / 2 | time 1[s] | loss 0.78
| epoch 700 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 701 |  iter 1 / 2 | time 1[s] | loss 0.60
| epoch 702 |  iter 1 / 2 | time 1[s] | loss 0.68
| epoch 703 |  iter 1 / 2 | time 1[s] | loss 0.64
| epoch 704 |  iter 1 / 2 | time 1[s] | loss 0.58
| epoch 705 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 706 |  iter 1 / 2 | time 1[s] | loss 0.61
| epoch 707 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 708 |  iter 1 / 2 | time 1[s] | loss 0.61
| epoch 709 |  iter 1 / 2 | time 1[s] | loss 0.66
| epoch 710 |  iter 1 / 2 | time 1[s] | loss 0.51
| epoch 711 |  iter 1 / 2 | time 1[s] | loss 0.72
| epoch 712 |  iter 1 / 2 | time 1[s] | loss 0.57
| epoch 713 |  iter 1 / 2 | time 1[s] | loss 0.58
| epoch 714 |  iter 1 / 2 | time 1[s] | loss 0.61
| epoch 715 |  iter 1 / 2 | time 1[s] | loss 0.48
| epoch 716 |  iter 1 / 2 | time 1[s] | loss 0.57
| epoch 717 |  iter 1 / 2 | time 1[s] | loss 0.63
| epoch 718 |  iter 1 / 2 | time 1[s] | loss 0.67
| epoch 719 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 720 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 721 |  iter 1 / 2 | time 1[s] | loss 0.67
| epoch 722 |  iter 1 / 2 | time 1[s] | loss 0.58
| epoch 723 |  iter 1 / 2 | time 1[s] | loss 0.69
| epoch 724 |  iter 1 / 2 | time 1[s] | loss 0.60
| epoch 725 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 726 |  iter 1 / 2 | time 1[s] | loss 0.60
| epoch 727 |  iter 1 / 2 | time 1[s] | loss 0.58
| epoch 728 |  iter 1 / 2 | time 1[s] | loss 0.58
| epoch 729 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 730 |  iter 1 / 2 | time 1[s] | loss 0.52
| epoch 731 |  iter 1 / 2 | time 1[s] | loss 0.58
| epoch 732 |  iter 1 / 2 | time 1[s] | loss 0.62
| epoch 733 |  iter 1 / 2 | time 1[s] | loss 0.60
| epoch 734 |  iter 1 / 2 | time 1[s] | loss 0.58
| epoch 735 |  iter 1 / 2 | time 1[s] | loss 0.60
| epoch 736 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 737 |  iter 1 / 2 | time 1[s] | loss 0.68
| epoch 738 |  iter 1 / 2 | time 1[s] | loss 0.47
| epoch 739 |  iter 1 / 2 | time 1[s] | loss 0.57
| epoch 740 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 741 |  iter 1 / 2 | time 1[s] | loss 0.51
| epoch 742 |  iter 1 / 2 | time 1[s] | loss 0.65
| epoch 743 |  iter 1 / 2 | time 1[s] | loss 0.47
| epoch 744 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 745 |  iter 1 / 2 | time 1[s] | loss 0.67
| epoch 746 |  iter 1 / 2 | time 1[s] | loss 0.36
| epoch 747 |  iter 1 / 2 | time 1[s] | loss 0.77
| epoch 748 |  iter 1 / 2 | time 1[s] | loss 0.57
| epoch 749 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 750 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 751 |  iter 1 / 2 | time 1[s] | loss 0.67
| epoch 752 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 753 |  iter 1 / 2 | time 1[s] | loss 0.44
| epoch 754 |  iter 1 / 2 | time 1[s] | loss 0.48
| epoch 755 |  iter 1 / 2 | time 1[s] | loss 0.67
| epoch 756 |  iter 1 / 2 | time 1[s] | loss 0.63
| epoch 757 |  iter 1 / 2 | time 1[s] | loss 0.39
| epoch 758 |  iter 1 / 2 | time 1[s] | loss 0.68
| epoch 759 |  iter 1 / 2 | time 1[s] | loss 0.62
| epoch 760 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 761 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 762 |  iter 1 / 2 | time 1[s] | loss 0.64
| epoch 763 |  iter 1 / 2 | time 1[s] | loss 0.47
| epoch 764 |  iter 1 / 2 | time 1[s] | loss 0.57
| epoch 765 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 766 |  iter 1 / 2 | time 1[s] | loss 0.63
| epoch 767 |  iter 1 / 2 | time 1[s] | loss 0.37
| epoch 768 |  iter 1 / 2 | time 1[s] | loss 0.74
| epoch 769 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 770 |  iter 1 / 2 | time 1[s] | loss 0.54
| epoch 771 |  iter 1 / 2 | time 1[s] | loss 0.57
| epoch 772 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 773 |  iter 1 / 2 | time 1[s] | loss 0.63
| epoch 774 |  iter 1 / 2 | time 1[s] | loss 0.45
| epoch 775 |  iter 1 / 2 | time 1[s] | loss 0.67
| epoch 776 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 777 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 778 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 779 |  iter 1 / 2 | time 1[s] | loss 0.67
| epoch 780 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 781 |  iter 1 / 2 | time 1[s] | loss 0.64
| epoch 782 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 783 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 784 |  iter 1 / 2 | time 1[s] | loss 0.54
| epoch 785 |  iter 1 / 2 | time 1[s] | loss 0.58
| epoch 786 |  iter 1 / 2 | time 1[s] | loss 0.52
| epoch 787 |  iter 1 / 2 | time 1[s] | loss 0.45
| epoch 788 |  iter 1 / 2 | time 1[s] | loss 0.54
| epoch 789 |  iter 1 / 2 | time 1[s] | loss 0.64
| epoch 790 |  iter 1 / 2 | time 1[s] | loss 0.48
| epoch 791 |  iter 1 / 2 | time 1[s] | loss 0.51
| epoch 792 |  iter 1 / 2 | time 1[s] | loss 0.63
| epoch 793 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 794 |  iter 1 / 2 | time 1[s] | loss 0.62
| epoch 795 |  iter 1 / 2 | time 1[s] | loss 0.45
| epoch 796 |  iter 1 / 2 | time 1[s] | loss 0.65
| epoch 797 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 798 |  iter 1 / 2 | time 1[s] | loss 0.57
| epoch 799 |  iter 1 / 2 | time 1[s] | loss 0.62
| epoch 800 |  iter 1 / 2 | time 1[s] | loss 0.34
| epoch 801 |  iter 1 / 2 | time 1[s] | loss 0.64
| epoch 802 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 803 |  iter 1 / 2 | time 1[s] | loss 0.71
| epoch 804 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 805 |  iter 1 / 2 | time 1[s] | loss 0.45
| epoch 806 |  iter 1 / 2 | time 1[s] | loss 0.62
| epoch 807 |  iter 1 / 2 | time 1[s] | loss 0.42
| epoch 808 |  iter 1 / 2 | time 1[s] | loss 0.64
| epoch 809 |  iter 1 / 2 | time 1[s] | loss 0.42
| epoch 810 |  iter 1 / 2 | time 1[s] | loss 0.45
| epoch 811 |  iter 1 / 2 | time 1[s] | loss 0.64
| epoch 812 |  iter 1 / 2 | time 1[s] | loss 0.53
| epoch 813 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 814 |  iter 1 / 2 | time 1[s] | loss 0.47
| epoch 815 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 816 |  iter 1 / 2 | time 1[s] | loss 0.44
| epoch 817 |  iter 1 / 2 | time 1[s] | loss 0.63
| epoch 818 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 819 |  iter 1 / 2 | time 1[s] | loss 0.53
| epoch 820 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 821 |  iter 1 / 2 | time 1[s] | loss 0.41
| epoch 822 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 823 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 824 |  iter 1 / 2 | time 1[s] | loss 0.44
| epoch 825 |  iter 1 / 2 | time 1[s] | loss 0.52
| epoch 826 |  iter 1 / 2 | time 1[s] | loss 0.63
| epoch 827 |  iter 1 / 2 | time 1[s] | loss 0.52
| epoch 828 |  iter 1 / 2 | time 1[s] | loss 0.41
| epoch 829 |  iter 1 / 2 | time 1[s] | loss 0.60
| epoch 830 |  iter 1 / 2 | time 1[s] | loss 0.35
| epoch 831 |  iter 1 / 2 | time 1[s] | loss 0.60
| epoch 832 |  iter 1 / 2 | time 1[s] | loss 0.52
| epoch 833 |  iter 1 / 2 | time 1[s] | loss 0.62
| epoch 834 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 835 |  iter 1 / 2 | time 1[s] | loss 0.52
| epoch 836 |  iter 1 / 2 | time 1[s] | loss 0.68
| epoch 837 |  iter 1 / 2 | time 1[s] | loss 0.51
| epoch 838 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 839 |  iter 1 / 2 | time 1[s] | loss 0.41
| epoch 840 |  iter 1 / 2 | time 1[s] | loss 0.51
| epoch 841 |  iter 1 / 2 | time 1[s] | loss 0.62
| epoch 842 |  iter 1 / 2 | time 1[s] | loss 0.51
| epoch 843 |  iter 1 / 2 | time 1[s] | loss 0.60
| epoch 844 |  iter 1 / 2 | time 1[s] | loss 0.32
| epoch 845 |  iter 1 / 2 | time 1[s] | loss 0.53
| epoch 846 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 847 |  iter 1 / 2 | time 1[s] | loss 0.48
| epoch 848 |  iter 1 / 2 | time 1[s] | loss 0.51
| epoch 849 |  iter 1 / 2 | time 1[s] | loss 0.51
| epoch 850 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 851 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 852 |  iter 1 / 2 | time 1[s] | loss 0.53
| epoch 853 |  iter 1 / 2 | time 1[s] | loss 0.48
| epoch 854 |  iter 1 / 2 | time 1[s] | loss 0.61
| epoch 855 |  iter 1 / 2 | time 1[s] | loss 0.42
| epoch 856 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 857 |  iter 1 / 2 | time 1[s] | loss 0.67
| epoch 858 |  iter 1 / 2 | time 1[s] | loss 0.23
| epoch 859 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 860 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 861 |  iter 1 / 2 | time 1[s] | loss 0.61
| epoch 862 |  iter 1 / 2 | time 1[s] | loss 0.41
| epoch 863 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 864 |  iter 1 / 2 | time 1[s] | loss 0.41
| epoch 865 |  iter 1 / 2 | time 1[s] | loss 0.60
| epoch 866 |  iter 1 / 2 | time 1[s] | loss 0.39
| epoch 867 |  iter 1 / 2 | time 1[s] | loss 0.69
| epoch 868 |  iter 1 / 2 | time 1[s] | loss 0.31
| epoch 869 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 870 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 871 |  iter 1 / 2 | time 1[s] | loss 0.31
| epoch 872 |  iter 1 / 2 | time 1[s] | loss 0.66
| epoch 873 |  iter 1 / 2 | time 1[s] | loss 0.41
| epoch 874 |  iter 1 / 2 | time 1[s] | loss 0.53
| epoch 875 |  iter 1 / 2 | time 1[s] | loss 0.57
| epoch 876 |  iter 1 / 2 | time 1[s] | loss 0.41
| epoch 877 |  iter 1 / 2 | time 1[s] | loss 0.58
| epoch 878 |  iter 1 / 2 | time 1[s] | loss 0.39
| epoch 879 |  iter 1 / 2 | time 1[s] | loss 0.40
| epoch 880 |  iter 1 / 2 | time 1[s] | loss 0.67
| epoch 881 |  iter 1 / 2 | time 1[s] | loss 0.39
| epoch 882 |  iter 1 / 2 | time 1[s] | loss 0.51
| epoch 883 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 884 |  iter 1 / 2 | time 1[s] | loss 0.48
| epoch 885 |  iter 1 / 2 | time 1[s] | loss 0.58
| epoch 886 |  iter 1 / 2 | time 1[s] | loss 0.40
| epoch 887 |  iter 1 / 2 | time 1[s] | loss 0.48
| epoch 888 |  iter 1 / 2 | time 1[s] | loss 0.39
| epoch 889 |  iter 1 / 2 | time 1[s] | loss 0.58
| epoch 890 |  iter 1 / 2 | time 1[s] | loss 0.37
| epoch 891 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 892 |  iter 1 / 2 | time 1[s] | loss 0.40
| epoch 893 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 894 |  iter 1 / 2 | time 1[s] | loss 0.42
| epoch 895 |  iter 1 / 2 | time 1[s] | loss 0.57
| epoch 896 |  iter 1 / 2 | time 1[s] | loss 0.37
| epoch 897 |  iter 1 / 2 | time 1[s] | loss 0.61
| epoch 898 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 899 |  iter 1 / 2 | time 1[s] | loss 0.48
| epoch 900 |  iter 1 / 2 | time 1[s] | loss 0.57
| epoch 901 |  iter 1 / 2 | time 1[s] | loss 0.39
| epoch 902 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 903 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 904 |  iter 1 / 2 | time 1[s] | loss 0.37
| epoch 905 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 906 |  iter 1 / 2 | time 1[s] | loss 0.58
| epoch 907 |  iter 1 / 2 | time 1[s] | loss 0.47
| epoch 908 |  iter 1 / 2 | time 1[s] | loss 0.32
| epoch 909 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 910 |  iter 1 / 2 | time 1[s] | loss 0.38
| epoch 911 |  iter 1 / 2 | time 1[s] | loss 0.54
| epoch 912 |  iter 1 / 2 | time 1[s] | loss 0.38
| epoch 913 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 914 |  iter 1 / 2 | time 1[s] | loss 0.47
| epoch 915 |  iter 1 / 2 | time 1[s] | loss 0.47
| epoch 916 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 917 |  iter 1 / 2 | time 1[s] | loss 0.29
| epoch 918 |  iter 1 / 2 | time 1[s] | loss 0.47
| epoch 919 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 920 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 921 |  iter 1 / 2 | time 1[s] | loss 0.38
| epoch 922 |  iter 1 / 2 | time 1[s] | loss 0.38
| epoch 923 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 924 |  iter 1 / 2 | time 1[s] | loss 0.40
| epoch 925 |  iter 1 / 2 | time 1[s] | loss 0.44
| epoch 926 |  iter 1 / 2 | time 1[s] | loss 0.64
| epoch 927 |  iter 1 / 2 | time 1[s] | loss 0.26
| epoch 928 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 929 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 930 |  iter 1 / 2 | time 1[s] | loss 0.57
| epoch 931 |  iter 1 / 2 | time 1[s] | loss 0.38
| epoch 932 |  iter 1 / 2 | time 1[s] | loss 0.37
| epoch 933 |  iter 1 / 2 | time 1[s] | loss 0.55
| epoch 934 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 935 |  iter 1 / 2 | time 1[s] | loss 0.48
| epoch 936 |  iter 1 / 2 | time 1[s] | loss 0.53
| epoch 937 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 938 |  iter 1 / 2 | time 1[s] | loss 0.37
| epoch 939 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 940 |  iter 1 / 2 | time 1[s] | loss 0.54
| epoch 941 |  iter 1 / 2 | time 1[s] | loss 0.48
| epoch 942 |  iter 1 / 2 | time 1[s] | loss 0.48
| epoch 943 |  iter 1 / 2 | time 1[s] | loss 0.42
| epoch 944 |  iter 1 / 2 | time 1[s] | loss 0.37
| epoch 945 |  iter 1 / 2 | time 1[s] | loss 0.45
| epoch 946 |  iter 1 / 2 | time 1[s] | loss 0.45
| epoch 947 |  iter 1 / 2 | time 1[s] | loss 0.53
| epoch 948 |  iter 1 / 2 | time 1[s] | loss 0.34
| epoch 949 |  iter 1 / 2 | time 1[s] | loss 0.56
| epoch 950 |  iter 1 / 2 | time 1[s] | loss 0.45
| epoch 951 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 952 |  iter 1 / 2 | time 1[s] | loss 0.35
| epoch 953 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 954 |  iter 1 / 2 | time 1[s] | loss 0.44
| epoch 955 |  iter 1 / 2 | time 1[s] | loss 0.45
| epoch 956 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 957 |  iter 1 / 2 | time 1[s] | loss 0.41
| epoch 958 |  iter 1 / 2 | time 1[s] | loss 0.59
| epoch 959 |  iter 1 / 2 | time 1[s] | loss 0.33
| epoch 960 |  iter 1 / 2 | time 1[s] | loss 0.54
| epoch 961 |  iter 1 / 2 | time 1[s] | loss 0.41
| epoch 962 |  iter 1 / 2 | time 1[s] | loss 0.35
| epoch 963 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 964 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 965 |  iter 1 / 2 | time 1[s] | loss 0.48
| epoch 966 |  iter 1 / 2 | time 1[s] | loss 0.35
| epoch 967 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 968 |  iter 1 / 2 | time 1[s] | loss 0.44
| epoch 969 |  iter 1 / 2 | time 1[s] | loss 0.38
| epoch 970 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 971 |  iter 1 / 2 | time 1[s] | loss 0.27
| epoch 972 |  iter 1 / 2 | time 1[s] | loss 0.44
| epoch 973 |  iter 1 / 2 | time 1[s] | loss 0.53
| epoch 974 |  iter 1 / 2 | time 1[s] | loss 0.35
| epoch 975 |  iter 1 / 2 | time 1[s] | loss 0.53
| epoch 976 |  iter 1 / 2 | time 1[s] | loss 0.44
| epoch 977 |  iter 1 / 2 | time 1[s] | loss 0.46
| epoch 978 |  iter 1 / 2 | time 1[s] | loss 0.42
| epoch 979 |  iter 1 / 2 | time 1[s] | loss 0.40
| epoch 980 |  iter 1 / 2 | time 1[s] | loss 0.49
| epoch 981 |  iter 1 / 2 | time 1[s] | loss 0.53
| epoch 982 |  iter 1 / 2 | time 1[s] | loss 0.32
| epoch 983 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 984 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 985 |  iter 1 / 2 | time 1[s] | loss 0.50
| epoch 986 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 987 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 988 |  iter 1 / 2 | time 1[s] | loss 0.41
| epoch 989 |  iter 1 / 2 | time 1[s] | loss 0.45
| epoch 990 |  iter 1 / 2 | time 1[s] | loss 0.36
| epoch 991 |  iter 1 / 2 | time 1[s] | loss 0.39
| epoch 992 |  iter 1 / 2 | time 1[s] | loss 0.47
| epoch 993 |  iter 1 / 2 | time 1[s] | loss 0.45
| epoch 994 |  iter 1 / 2 | time 1[s] | loss 0.41
| epoch 995 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 996 |  iter 1 / 2 | time 1[s] | loss 0.43
| epoch 997 |  iter 1 / 2 | time 1[s] | loss 0.48
| epoch 998 |  iter 1 / 2 | time 1[s] | loss 0.35
| epoch 999 |  iter 1 / 2 | time 1[s] | loss 0.52
| epoch 1000 |  iter 1 / 2 | time 1[s] | loss 0.26

損失値をプロットしてみる

%python
trainer.plot()
z.show(plt, fmt='svg')

ベクトルを表示してみる

%python
word_vecs = model.word_vecs
for word_id, word in id_to_word.items():
    print(word, word_vecs[word_id])
you [ 0.93710303  0.93910193  1.7272372  -0.89610606  1.0445951 ]
say [-1.1644877  -1.2109934  -0.20577171  1.23597    -1.2464908 ]
goodbye [ 1.1030452  1.0522411 -0.1555654 -1.0932515  0.8510445]
and [-0.77724737 -1.0205745  -1.8217171   0.9609459  -1.050846  ]
i [ 1.1081636   1.0668204  -0.13783155 -1.1415119   0.8612589 ]
hello [ 0.9385245  0.9236376  1.7012237 -0.9081088  1.0232164]
. [-1.1489888 -1.061469   1.6251746  1.10383   -1.1069291]

  • 単語を密なベクトルで表現できたが、小さなコーパスでは良い結果は得られない
  • 今回の実装では大きなコーパスを処理することはできない

CBOW モデル: 補足

確率の表記

  • \( P(A) \): A という事象が起こる確率
  • \( P(A, B) \): A と B が同時に起こる確率
  • \( P(A|B) \): B が起きたあとに A が起こる確率( 事後確率

CBOW モデルと確率

$$
P(w_t | w_{t-1},w_{t+1})
$$

\(上の式は w_{t-1} と w_{t+1} が起こった後に w_t が起こる確率を表す\)

交差エントロピー誤差

$$
L = -\ log\ P(w_t|w_{t-1},w_{t+1})
$$

コーパス全体に拡張すると

$$
L = - \frac{1}{T} \sum^{T}_{t=1} log\ P(w_t|w_{t-1},w_{t+1})
$$

学習するときはこの損失関数を出来る限り小さくする

%md