ゼロから作る Deep Learning 2/word2vec
Posted on
3章 - word2vec
ゼロから作る Deep Learning (2) 自然言語処理編の読書メモです。今回は3章の “word2vec” まで読みました。単語の分散表現について CBOW モデルを利用した推論ベースの手法をみていきます。
カウントベースの手法との違い
- カウントベースの手法では学習データを一度にまとめて処理する
- 推論ベースの手法では学習データの一部を使って逐次的に学習を行う
- データを小分けにして学習できる
推論ベースの手法
you ??? goodbye and I say hello
上の ???
にあたる部分にどのような単語が出現するかを推測する(モデルは各単語の出現確率を出力)
単語の処理
- ニューラルネットワークで単語を処理するには固定長のベクトルに変換する必要がある
- one-hot 表現: ベクトルの要素の中のひとつだけが 1 で残りはすべて 0 であるようなベクトル
全結合層
%sh pip3 install numpy matplotlib
Requirement already satisfied (use --upgrade to upgrade): numpy in /usr/local/lib/python3.5/dist-packages Collecting matplotlib Downloading https://files.pythonhosted.org/packages/7b/ca/8b55a66b7ce426329ab16419a7eee4eb35b5a3fbe0d002434b339a4a7b09/matplotlib-3.0.0-cp35-cp35m-manylinux1_x86_64.whl (12.8MB) Collecting cycler>=0.10 (from matplotlib) Using cached https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl Collecting python-dateutil>=2.1 (from matplotlib) Using cached https://files.pythonhosted.org/packages/cf/f5/af2b09c957ace60dcfac112b669c45c8c97e32f94aa8b56da4c6d1682825/python_dateutil-2.7.3-py2.py3-none-any.whl Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 (from matplotlib) Downloading https://files.pythonhosted.org/packages/42/47/e6d51aef3d0393f7d343592d63a73beee2a8d3d69c22b053e252c6cfacd5/pyparsing-2.2.1-py2.py3-none-any.whl (57kB) Collecting kiwisolver>=1.0.1 (from matplotlib) Downloading https://files.pythonhosted.org/packages/7e/31/d6fedd4fb2c94755cd101191e581af30e1650ccce7a35bddb7930fed6574/kiwisolver-1.0.1-cp35-cp35m-manylinux1_x86_64.whl (949kB) Collecting six (from cycler>=0.10->matplotlib) Downloading https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/lib/python3/dist-packages (from kiwisolver>=1.0.1->matplotlib) Installing collected packages: six, cycler, python-dateutil, pyparsing, kiwisolver, matplotlib Successfully installed cycler-0.10.0 kiwisolver-1.0.1 matplotlib-3.0.0 pyparsing-2.2.1 python-dateutil-2.7.3 six-1.11.0 You are using pip version 8.1.1, however version 18.0 is available. You should consider upgrading via the 'pip install --upgrade pip' command.
%python import numpy as np c = np.array([[1, 0, 0, 0, 0, 0, 0]]) # 入力 W = np.random.randn(7, 3) # 重み h = np.dot(c, W) # 中間ノード h
array([[ 0.51548717, 0.69375812, -0.52163008]])
重みWの対応する行が抜き出される
CBOWモデル: 推論処理
%python class MatMul: def __init__(self, W): self.params = [W] self.grads = [np.zeros_like(W)] self.x = None def forward(self, x): W, = self.params out = np.dot(x, W) self.x = x return out def backward(self, dout): W, = self.params dx = np.dot(dout, W.T) dW = np.dot(self.x.T, dout) self.grads[0][...] = dW return dx
%python import numpy as np # コンテキストデータ c0 = np.array([[1, 0, 0, 0, 0, 0, 0]]) c1 = np.array([[0, 0, 1, 0, 0, 0, 0]]) # 重みの初期化 W_in = np.random.randn(7, 3) W_out = np.random.randn(3, 7) # レイヤの生成 in_layer0 = MatMul(W_in) in_layer1 = MatMul(W_in) out_layer = MatMul(W_out) # 順伝搬 h0 = in_layer0.forward(c0) h1 = in_layer1.forward(c1) h = 0.5 * (h0 + h1) s = out_layer.forward(h) s
array([[-2.89929998, -2.361709 , 3.07450532, 1.05383403, 3.11066845, -0.50834708, -2.51717838]])
CBOW モデル: 学習
- 上記の実装では出力層で各単語のスコアを出力している
- このスコアに対して Softmax 関数を適用すると確率を得ることが出来る
- 得られた確率の交差エントロピー誤差を損失値として扱う(TODO: ここよく分かってない)
- CBOW モデルはコーパスにおける単語の出現パターンを学ぶだけ
- コーパスが違えば得られる単語の分散表現も異なってくる
%python # 単語にIDを割り振る def preprocess(text): text = text.lower() text = text.replace('.', ' .') words = text.split(' ') word_to_id = {} id_to_word = {} for word in words: if word not in word_to_id: new_id = len(word_to_id) word_to_id[word] = new_id id_to_word[new_id] = word corpus = np.array([word_to_id[w] for w in words]) return corpus, word_to_id, id_to_word
%python text = 'You say goodbye and I say hello.' corpus, word_to_id, id_to_word = preprocess(text) print('corpus: ', corpus) print('id_to_word: ', id_to_word)
corpus: [0 1 2 3 4 1 5 6] id_to_word: {0: 'you', 1: 'say', 2: 'goodbye', 3: 'and', 4: 'i', 5: 'hello', 6: '.'}
%python def create_contexts_target(corpus, window_size=1): '''one-hot表現への変換を行う :param words: 単語IDのNumPy配列 :param vocab_size: 語彙数 :return: one-hot表現に変換後のNumPy配列 ''' target = corpus[window_size:-window_size] contexts = [] for idx in range(window_size, len(corpus)-window_size): cs = [] for t in range(-window_size, window_size + 1): if t == 0: continue cs.append(corpus[idx + t]) contexts.append(cs) return np.array(contexts), np.array(target)
%python contexts, target = create_contexts_target(corpus, window_size=1) print('contexts: ', contexts) print('target: ', target)
contexts: [[0 2] [1 3] [2 4] [3 1] [4 5] [1 6]] target: [1 2 3 4 1 5]
%python def convert_one_hot(corpus, vocab_size): '''one-hot表現への変換 :param corpus: 単語IDのリスト(1次元もしくは2次元のNumPy配列) :param vocab_size: 語彙数 :return: one-hot表現(2次元もしくは3次元のNumPy配列) ''' N = corpus.shape[0] if corpus.ndim == 1: one_hot = np.zeros((N, vocab_size), dtype=np.int32) for idx, word_id in enumerate(corpus): one_hot[idx, word_id] = 1 elif corpus.ndim == 2: C = corpus.shape[1] one_hot = np.zeros((N, C, vocab_size), dtype=np.int32) for idx_0, word_ids in enumerate(corpus): for idx_1, word_id in enumerate(word_ids): one_hot[idx_0, idx_1, word_id] = 1 return one_hot
%python vocab_size = len(word_to_id) target = convert_one_hot(target, vocab_size) contexts = convert_one_hot(contexts, vocab_size)
CBOW モデル: 実装
%python def softmax(x): if x.ndim == 2: x = x - x.max(axis=1, keepdims=True) x = np.exp(x) x /= x.sum(axis=1, keepdims=True) elif x.ndim == 1: x = x - np.max(x) x = np.exp(x) / np.sum(np.exp(x)) return x def cross_entropy_error(y, t): if y.ndim == 1: t = t.reshape(1, t.size) y = y.reshape(1, y.size) # 教師データがone-hot-vectorの場合、正解ラベルのインデックスに変換 if t.size == y.size: t = t.argmax(axis=1) batch_size = y.shape[0] return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size class SoftmaxWithLoss: def __init__(self): self.params, self.grads = [], [] self.y = None # softmaxの出力 self.t = None # 教師ラベル def forward(self, x, t): self.t = t self.y = softmax(x) # 教師ラベルがone-hotベクトルの場合、正解のインデックスに変換 if self.t.size == self.y.size: self.t = self.t.argmax(axis=1) loss = cross_entropy_error(self.y, self.t) return loss def backward(self, dout=1): batch_size = self.t.shape[0] dx = self.y.copy() dx[np.arange(batch_size), self.t] -= 1 dx *= dout dx = dx / batch_size return dx
%python class SimpleCBOW: # 初期化 def __init__(self, vocab_size, hidden_size): V, H = vocab_size, hidden_size # 重みの初期化 W_in = 0.01 * np.random.randn(V, H).astype('f') W_out = 0.01 * np.random.randn(H, V).astype('f') # レイヤの生成 self.in_layer0 = MatMul(W_in) self.in_layer1 = MatMul(W_in) self.out_layer = MatMul(W_out) self.loss_layer = SoftmaxWithLoss() # layers = [self.in_layer0, self.in_layer1, self.out_layer] self.params, self.grads = [], [] for layer in layers: self.params += layer.params self.grads += layer.grads # 単語の分散表現 self.word_vecs = W_in # 順伝搬 def forward(self, contexts, target): h0 = self.in_layer0.forward(contexts[:, 0]) h1 = self.in_layer1.forward(contexts[:, 1]) h = (h0 + h1) * 0.5 score = self.out_layer.forward(h) loss = self.loss_layer.forward(score, target) return loss # 逆伝搬 def backward(self, dout=1): ds = self.loss_layer.backward(dout) da = self.out_layer.backward(ds) da *= 0.5 self.in_layer0.backward(da) self.in_layer1.backward(da) return None
Trainer
%python import numpy import time import matplotlib.pyplot as plt def clip_grads(grads, max_norm): total_norm = 0 for grad in grads: total_norm += np.sum(grad ** 2) total_norm = np.sqrt(total_norm) rate = max_norm / (total_norm + 1e-6) if rate < 1: for grad in grads: grad *= rate def remove_duplicate(params, grads): ''' パラメータ配列中の重複する重みをひとつに集約し、 その重みに対応する勾配を加算する ''' params, grads = params[:], grads[:] # copy list while True: find_flg = False L = len(params) for i in range(0, L - 1): for j in range(i + 1, L): # 重みを共有する場合 if params[i] is params[j]: grads[i] += grads[j] # 勾配の加算 find_flg = True params.pop(j) grads.pop(j) # 転置行列として重みを共有する場合(weight tying) elif params[i].ndim == 2 and params[j].ndim == 2 and \ params[i].T.shape == params[j].shape and np.all(params[i].T == params[j]): grads[i] += grads[j].T find_flg = True params.pop(j) grads.pop(j) if find_flg: break if find_flg: break if not find_flg: break return params, grads class Trainer: def __init__(self, model, optimizer): self.model = model self.optimizer = optimizer self.loss_list = [] self.eval_interval = None self.current_epoch = 0 def fit(self, x, t, max_epoch=10, batch_size=32, max_grad=None, eval_interval=20): data_size = len(x) max_iters = data_size // batch_size self.eval_interval = eval_interval model, optimizer = self.model, self.optimizer total_loss = 0 loss_count = 0 start_time = time.time() for epoch in range(max_epoch): # シャッフル idx = numpy.random.permutation(numpy.arange(data_size)) x = x[idx] t = t[idx] for iters in range(max_iters): batch_x = x[iters*batch_size:(iters+1)*batch_size] batch_t = t[iters*batch_size:(iters+1)*batch_size] # 勾配を求め、パラメータを更新 loss = model.forward(batch_x, batch_t) model.backward() params, grads = remove_duplicate(model.params, model.grads) # 共有された重みを1つに集約 if max_grad is not None: clip_grads(grads, max_grad) optimizer.update(params, grads) total_loss += loss loss_count += 1 # 評価 if (eval_interval is not None) and (iters % eval_interval) == 0: avg_loss = total_loss / loss_count elapsed_time = time.time() - start_time print('| epoch %d | iter %d / %d | time %d[s] | loss %.2f' % (self.current_epoch + 1, iters + 1, max_iters, elapsed_time, avg_loss)) self.loss_list.append(float(avg_loss)) total_loss, loss_count = 0, 0 self.current_epoch += 1 def plot(self, ylim=None): x = numpy.arange(len(self.loss_list)) if ylim is not None: plt.ylim(*ylim) plt.plot(x, self.loss_list, label='train') plt.xlabel('iterations (x' + str(self.eval_interval) + ')') plt.ylabel('loss') plt.show()
Optimizer
%python class Adam: ''' Adam (http://arxiv.org/abs/1412.6980v8) ''' def __init__(self, lr=0.001, beta1=0.9, beta2=0.999): self.lr = lr self.beta1 = beta1 self.beta2 = beta2 self.iter = 0 self.m = None self.v = None def update(self, params, grads): if self.m is None: self.m, self.v = [], [] for param in params: self.m.append(np.zeros_like(param)) self.v.append(np.zeros_like(param)) self.iter += 1 lr_t = self.lr * np.sqrt(1.0 - self.beta2**self.iter) / (1.0 - self.beta1**self.iter) for i in range(len(params)): self.m[i] += (1 - self.beta1) * (grads[i] - self.m[i]) self.v[i] += (1 - self.beta2) * (grads[i]**2 - self.v[i]) params[i] -= lr_t * self.m[i] / (np.sqrt(self.v[i]) + 1e-7)
トレーニング実行
%python window_size = 1 hidden_size = 5 batch_size = 3 max_epoch = 1000 model = SimpleCBOW(vocab_size, hidden_size) optimizer = Adam() trainer = Trainer(model, optimizer) trainer.fit(contexts, target, max_epoch, batch_size)
| epoch 1 | iter 1 / 2 | time 0[s] | loss 1.95 | epoch 2 | iter 1 / 2 | time 0[s] | loss 1.95 | epoch 3 | iter 1 / 2 | time 0[s] | loss 1.95 | epoch 4 | iter 1 / 2 | time 0[s] | loss 1.95 | epoch 5 | iter 1 / 2 | time 0[s] | loss 1.95 | epoch 6 | iter 1 / 2 | time 0[s] | loss 1.95 | epoch 7 | iter 1 / 2 | time 0[s] | loss 1.95 | epoch 8 | iter 1 / 2 | time 0[s] | loss 1.95 | epoch 9 | iter 1 / 2 | time 0[s] | loss 1.95 | epoch 10 | iter 1 / 2 | time 0[s] | loss 1.95 | epoch 11 | iter 1 / 2 | time 0[s] | loss 1.95 | epoch 12 | iter 1 / 2 | time 0[s] | loss 1.95 | epoch 13 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 14 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 15 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 16 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 17 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 18 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 19 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 20 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 21 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 22 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 23 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 24 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 25 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 26 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 27 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 28 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 29 | iter 1 / 2 | time 0[s] | loss 1.94 | epoch 30 | iter 1 / 2 | time 0[s] | loss 1.93 | epoch 31 | iter 1 / 2 | time 0[s] | loss 1.93 | epoch 32 | iter 1 / 2 | time 0[s] | loss 1.93 | epoch 33 | iter 1 / 2 | time 0[s] | loss 1.93 | epoch 34 | iter 1 / 2 | time 0[s] | loss 1.93 | epoch 35 | iter 1 / 2 | time 0[s] | loss 1.93 | epoch 36 | iter 1 / 2 | time 0[s] | loss 1.93 | epoch 37 | iter 1 / 2 | time 0[s] | loss 1.93 | epoch 38 | iter 1 / 2 | time 0[s] | loss 1.93 | epoch 39 | iter 1 / 2 | time 0[s] | loss 1.92 | epoch 40 | iter 1 / 2 | time 0[s] | loss 1.92 | epoch 41 | iter 1 / 2 | time 0[s] | loss 1.92 | epoch 42 | iter 1 / 2 | time 0[s] | loss 1.92 | epoch 43 | iter 1 / 2 | time 0[s] | loss 1.91 | epoch 44 | iter 1 / 2 | time 0[s] | loss 1.92 | epoch 45 | iter 1 / 2 | time 0[s] | loss 1.91 | epoch 46 | iter 1 / 2 | time 0[s] | loss 1.92 | epoch 47 | iter 1 / 2 | time 0[s] | loss 1.91 | epoch 48 | iter 1 / 2 | time 0[s] | loss 1.90 | epoch 49 | iter 1 / 2 | time 0[s] | loss 1.91 | epoch 50 | iter 1 / 2 | time 0[s] | loss 1.90 | epoch 51 | iter 1 / 2 | time 0[s] | loss 1.90 | epoch 52 | iter 1 / 2 | time 0[s] | loss 1.91 | epoch 53 | iter 1 / 2 | time 0[s] | loss 1.89 | epoch 54 | iter 1 / 2 | time 0[s] | loss 1.90 | epoch 55 | iter 1 / 2 | time 0[s] | loss 1.89 | epoch 56 | iter 1 / 2 | time 0[s] | loss 1.88 | epoch 57 | iter 1 / 2 | time 0[s] | loss 1.89 | epoch 58 | iter 1 / 2 | time 0[s] | loss 1.88 | epoch 59 | iter 1 / 2 | time 0[s] | loss 1.89 | epoch 60 | iter 1 / 2 | time 0[s] | loss 1.88 | epoch 61 | iter 1 / 2 | time 0[s] | loss 1.87 | epoch 62 | iter 1 / 2 | time 0[s] | loss 1.88 | epoch 63 | iter 1 / 2 | time 0[s] | loss 1.86 | epoch 64 | iter 1 / 2 | time 0[s] | loss 1.87 | epoch 65 | iter 1 / 2 | time 0[s] | loss 1.85 | epoch 66 | iter 1 / 2 | time 0[s] | loss 1.86 | epoch 67 | iter 1 / 2 | time 0[s] | loss 1.86 | epoch 68 | iter 1 / 2 | time 0[s] | loss 1.85 | epoch 69 | iter 1 / 2 | time 0[s] | loss 1.85 | epoch 70 | iter 1 / 2 | time 0[s] | loss 1.85 | epoch 71 | iter 1 / 2 | time 0[s] | loss 1.83 | epoch 72 | iter 1 / 2 | time 0[s] | loss 1.84 | epoch 73 | iter 1 / 2 | time 0[s] | loss 1.83 | epoch 74 | iter 1 / 2 | time 0[s] | loss 1.83 | epoch 75 | iter 1 / 2 | time 0[s] | loss 1.82 | epoch 76 | iter 1 / 2 | time 0[s] | loss 1.83 | epoch 77 | iter 1 / 2 | time 0[s] | loss 1.80 | epoch 78 | iter 1 / 2 | time 0[s] | loss 1.83 | epoch 79 | iter 1 / 2 | time 0[s] | loss 1.80 | epoch 80 | iter 1 / 2 | time 0[s] | loss 1.82 | epoch 81 | iter 1 / 2 | time 0[s] | loss 1.80 | epoch 82 | iter 1 / 2 | time 0[s] | loss 1.79 | epoch 83 | iter 1 / 2 | time 0[s] | loss 1.79 | epoch 84 | iter 1 / 2 | time 0[s] | loss 1.80 | epoch 85 | iter 1 / 2 | time 0[s] | loss 1.78 | epoch 86 | iter 1 / 2 | time 0[s] | loss 1.80 | epoch 87 | iter 1 / 2 | time 0[s] | loss 1.79 | epoch 88 | iter 1 / 2 | time 0[s] | loss 1.76 | epoch 89 | iter 1 / 2 | time 0[s] | loss 1.78 | epoch 90 | iter 1 / 2 | time 0[s] | loss 1.76 | epoch 91 | iter 1 / 2 | time 0[s] | loss 1.76 | epoch 92 | iter 1 / 2 | time 0[s] | loss 1.76 | epoch 93 | iter 1 / 2 | time 0[s] | loss 1.76 | epoch 94 | iter 1 / 2 | time 0[s] | loss 1.75 | epoch 95 | iter 1 / 2 | time 0[s] | loss 1.73 | epoch 96 | iter 1 / 2 | time 0[s] | loss 1.74 | epoch 97 | iter 1 / 2 | time 0[s] | loss 1.73 | epoch 98 | iter 1 / 2 | time 0[s] | loss 1.75 | epoch 99 | iter 1 / 2 | time 0[s] | loss 1.71 | epoch 100 | iter 1 / 2 | time 0[s] | loss 1.75 | epoch 101 | iter 1 / 2 | time 0[s] | loss 1.70 | epoch 102 | iter 1 / 2 | time 0[s] | loss 1.71 | epoch 103 | iter 1 / 2 | time 0[s] | loss 1.71 | epoch 104 | iter 1 / 2 | time 0[s] | loss 1.70 | epoch 105 | iter 1 / 2 | time 0[s] | loss 1.70 | epoch 106 | iter 1 / 2 | time 0[s] | loss 1.68 | epoch 107 | iter 1 / 2 | time 0[s] | loss 1.70 | epoch 108 | iter 1 / 2 | time 0[s] | loss 1.69 | epoch 109 | iter 1 / 2 | time 0[s] | loss 1.68 | epoch 110 | iter 1 / 2 | time 0[s] | loss 1.67 | epoch 111 | iter 1 / 2 | time 0[s] | loss 1.70 | epoch 112 | iter 1 / 2 | time 0[s] | loss 1.63 | epoch 113 | iter 1 / 2 | time 0[s] | loss 1.67 | epoch 114 | iter 1 / 2 | time 0[s] | loss 1.68 | epoch 115 | iter 1 / 2 | time 0[s] | loss 1.64 | epoch 116 | iter 1 / 2 | time 0[s] | loss 1.64 | epoch 117 | iter 1 / 2 | time 0[s] | loss 1.65 | epoch 118 | iter 1 / 2 | time 0[s] | loss 1.61 | epoch 119 | iter 1 / 2 | time 0[s] | loss 1.63 | epoch 120 | iter 1 / 2 | time 0[s] | loss 1.65 | epoch 121 | iter 1 / 2 | time 0[s] | loss 1.62 | epoch 122 | iter 1 / 2 | time 0[s] | loss 1.62 | epoch 123 | iter 1 / 2 | time 0[s] | loss 1.61 | epoch 124 | iter 1 / 2 | time 0[s] | loss 1.58 | epoch 125 | iter 1 / 2 | time 0[s] | loss 1.63 | epoch 126 | iter 1 / 2 | time 0[s] | loss 1.60 | epoch 127 | iter 1 / 2 | time 0[s] | loss 1.58 | epoch 128 | iter 1 / 2 | time 0[s] | loss 1.59 | epoch 129 | iter 1 / 2 | time 0[s] | loss 1.59 | epoch 130 | iter 1 / 2 | time 0[s] | loss 1.58 | epoch 131 | iter 1 / 2 | time 0[s] | loss 1.57 | epoch 132 | iter 1 / 2 | time 0[s] | loss 1.58 | epoch 133 | iter 1 / 2 | time 0[s] | loss 1.55 | epoch 134 | iter 1 / 2 | time 0[s] | loss 1.57 | epoch 135 | iter 1 / 2 | time 0[s] | loss 1.51 | epoch 136 | iter 1 / 2 | time 0[s] | loss 1.60 | epoch 137 | iter 1 / 2 | time 0[s] | loss 1.51 | epoch 138 | iter 1 / 2 | time 0[s] | loss 1.53 | epoch 139 | iter 1 / 2 | time 0[s] | loss 1.54 | epoch 140 | iter 1 / 2 | time 0[s] | loss 1.52 | epoch 141 | iter 1 / 2 | time 0[s] | loss 1.53 | epoch 142 | iter 1 / 2 | time 0[s] | loss 1.54 | epoch 143 | iter 1 / 2 | time 0[s] | loss 1.52 | epoch 144 | iter 1 / 2 | time 0[s] | loss 1.46 | epoch 145 | iter 1 / 2 | time 0[s] | loss 1.51 | epoch 146 | iter 1 / 2 | time 0[s] | loss 1.50 | epoch 147 | iter 1 / 2 | time 0[s] | loss 1.49 | epoch 148 | iter 1 / 2 | time 0[s] | loss 1.49 | epoch 149 | iter 1 / 2 | time 0[s] | loss 1.45 | epoch 150 | iter 1 / 2 | time 0[s] | loss 1.51 | epoch 151 | iter 1 / 2 | time 0[s] | loss 1.48 | epoch 152 | iter 1 / 2 | time 0[s] | loss 1.46 | epoch 153 | iter 1 / 2 | time 0[s] | loss 1.45 | epoch 154 | iter 1 / 2 | time 0[s] | loss 1.46 | epoch 155 | iter 1 / 2 | time 0[s] | loss 1.46 | epoch 156 | iter 1 / 2 | time 0[s] | loss 1.46 | epoch 157 | iter 1 / 2 | time 0[s] | loss 1.44 | epoch 158 | iter 1 / 2 | time 0[s] | loss 1.43 | epoch 159 | iter 1 / 2 | time 0[s] | loss 1.43 | epoch 160 | iter 1 / 2 | time 0[s] | loss 1.43 | epoch 161 | iter 1 / 2 | time 0[s] | loss 1.39 | epoch 162 | iter 1 / 2 | time 0[s] | loss 1.48 | epoch 163 | iter 1 / 2 | time 0[s] | loss 1.34 | epoch 164 | iter 1 / 2 | time 0[s] | loss 1.41 | epoch 165 | iter 1 / 2 | time 0[s] | loss 1.41 | epoch 166 | iter 1 / 2 | time 0[s] | loss 1.49 | epoch 167 | iter 1 / 2 | time 0[s] | loss 1.30 | epoch 168 | iter 1 / 2 | time 0[s] | loss 1.47 | epoch 169 | iter 1 / 2 | time 0[s] | loss 1.40 | epoch 170 | iter 1 / 2 | time 0[s] | loss 1.33 | epoch 171 | iter 1 / 2 | time 0[s] | loss 1.43 | epoch 172 | iter 1 / 2 | time 0[s] | loss 1.28 | epoch 173 | iter 1 / 2 | time 0[s] | loss 1.46 | epoch 174 | iter 1 / 2 | time 0[s] | loss 1.31 | epoch 175 | iter 1 / 2 | time 0[s] | loss 1.36 | epoch 176 | iter 1 / 2 | time 0[s] | loss 1.36 | epoch 177 | iter 1 / 2 | time 0[s] | loss 1.41 | epoch 178 | iter 1 / 2 | time 0[s] | loss 1.29 | epoch 179 | iter 1 / 2 | time 0[s] | loss 1.35 | epoch 180 | iter 1 / 2 | time 0[s] | loss 1.33 | epoch 181 | iter 1 / 2 | time 0[s] | loss 1.35 | epoch 182 | iter 1 / 2 | time 0[s] | loss 1.33 | epoch 183 | iter 1 / 2 | time 0[s] | loss 1.36 | epoch 184 | iter 1 / 2 | time 0[s] | loss 1.23 | epoch 185 | iter 1 / 2 | time 0[s] | loss 1.38 | epoch 186 | iter 1 / 2 | time 0[s] | loss 1.30 | epoch 187 | iter 1 / 2 | time 0[s] | loss 1.33 | epoch 188 | iter 1 / 2 | time 0[s] | loss 1.35 | epoch 189 | iter 1 / 2 | time 0[s] | loss 1.30 | epoch 190 | iter 1 / 2 | time 0[s] | loss 1.19 | epoch 191 | iter 1 / 2 | time 0[s] | loss 1.34 | epoch 192 | iter 1 / 2 | time 0[s] | loss 1.35 | epoch 193 | iter 1 / 2 | time 0[s] | loss 1.17 | epoch 194 | iter 1 / 2 | time 0[s] | loss 1.34 | epoch 195 | iter 1 / 2 | time 0[s] | loss 1.28 | epoch 196 | iter 1 / 2 | time 0[s] | loss 1.27 | epoch 197 | iter 1 / 2 | time 0[s] | loss 1.27 | epoch 198 | iter 1 / 2 | time 0[s] | loss 1.21 | epoch 199 | iter 1 / 2 | time 0[s] | loss 1.32 | epoch 200 | iter 1 / 2 | time 0[s] | loss 1.20 | epoch 201 | iter 1 / 2 | time 0[s] | loss 1.32 | epoch 202 | iter 1 / 2 | time 0[s] | loss 1.24 | epoch 203 | iter 1 / 2 | time 0[s] | loss 1.31 | epoch 204 | iter 1 / 2 | time 0[s] | loss 1.24 | epoch 205 | iter 1 / 2 | time 0[s] | loss 1.13 | epoch 206 | iter 1 / 2 | time 0[s] | loss 1.28 | epoch 207 | iter 1 / 2 | time 0[s] | loss 1.18 | epoch 208 | iter 1 / 2 | time 0[s] | loss 1.24 | epoch 209 | iter 1 / 2 | time 0[s] | loss 1.27 | epoch 210 | iter 1 / 2 | time 0[s] | loss 1.16 | epoch 211 | iter 1 / 2 | time 0[s] | loss 1.28 | epoch 212 | iter 1 / 2 | time 0[s] | loss 1.22 | epoch 213 | iter 1 / 2 | time 0[s] | loss 1.21 | epoch 214 | iter 1 / 2 | time 0[s] | loss 1.21 | epoch 215 | iter 1 / 2 | time 0[s] | loss 1.15 | epoch 216 | iter 1 / 2 | time 0[s] | loss 1.26 | epoch 217 | iter 1 / 2 | time 0[s] | loss 1.19 | epoch 218 | iter 1 / 2 | time 0[s] | loss 1.20 | epoch 219 | iter 1 / 2 | time 0[s] | loss 1.19 | epoch 220 | iter 1 / 2 | time 0[s] | loss 1.12 | epoch 221 | iter 1 / 2 | time 0[s] | loss 1.26 | epoch 222 | iter 1 / 2 | time 0[s] | loss 1.18 | epoch 223 | iter 1 / 2 | time 0[s] | loss 1.18 | epoch 224 | iter 1 / 2 | time 0[s] | loss 1.24 | epoch 225 | iter 1 / 2 | time 0[s] | loss 1.04 | epoch 226 | iter 1 / 2 | time 0[s] | loss 1.23 | epoch 227 | iter 1 / 2 | time 0[s] | loss 1.10 | epoch 228 | iter 1 / 2 | time 0[s] | loss 1.30 | epoch 229 | iter 1 / 2 | time 0[s] | loss 1.09 | epoch 230 | iter 1 / 2 | time 0[s] | loss 1.16 | epoch 231 | iter 1 / 2 | time 0[s] | loss 1.15 | epoch 232 | iter 1 / 2 | time 0[s] | loss 1.08 | epoch 233 | iter 1 / 2 | time 0[s] | loss 1.22 | epoch 234 | iter 1 / 2 | time 0[s] | loss 1.14 | epoch 235 | iter 1 / 2 | time 0[s] | loss 1.08 | epoch 236 | iter 1 / 2 | time 0[s] | loss 1.14 | epoch 237 | iter 1 / 2 | time 0[s] | loss 1.21 | epoch 238 | iter 1 / 2 | time 0[s] | loss 1.05 | epoch 239 | iter 1 / 2 | time 0[s] | loss 1.27 | epoch 240 | iter 1 / 2 | time 0[s] | loss 1.06 | epoch 241 | iter 1 / 2 | time 0[s] | loss 1.05 | epoch 242 | iter 1 / 2 | time 0[s] | loss 1.27 | epoch 243 | iter 1 / 2 | time 0[s] | loss 1.04 | epoch 244 | iter 1 / 2 | time 0[s] | loss 1.12 | epoch 245 | iter 1 / 2 | time 0[s] | loss 1.11 | epoch 246 | iter 1 / 2 | time 0[s] | loss 1.12 | epoch 247 | iter 1 / 2 | time 0[s] | loss 1.02 | epoch 248 | iter 1 / 2 | time 0[s] | loss 1.18 | epoch 249 | iter 1 / 2 | time 0[s] | loss 1.11 | epoch 250 | iter 1 / 2 | time 0[s] | loss 1.02 | epoch 251 | iter 1 / 2 | time 0[s] | loss 1.17 | epoch 252 | iter 1 / 2 | time 0[s] | loss 1.02 | epoch 253 | iter 1 / 2 | time 0[s] | loss 1.24 | epoch 254 | iter 1 / 2 | time 0[s] | loss 1.00 | epoch 255 | iter 1 / 2 | time 0[s] | loss 1.09 | epoch 256 | iter 1 / 2 | time 0[s] | loss 1.16 | epoch 257 | iter 1 / 2 | time 0[s] | loss 0.99 | epoch 258 | iter 1 / 2 | time 0[s] | loss 1.00 | epoch 259 | iter 1 / 2 | time 0[s] | loss 1.17 | epoch 260 | iter 1 / 2 | time 0[s] | loss 1.07 | epoch 261 | iter 1 / 2 | time 0[s] | loss 1.07 | epoch 262 | iter 1 / 2 | time 0[s] | loss 1.00 | epoch 263 | iter 1 / 2 | time 0[s] | loss 1.21 | epoch 264 | iter 1 / 2 | time 0[s] | loss 0.97 | epoch 265 | iter 1 / 2 | time 0[s] | loss 1.08 | epoch 266 | iter 1 / 2 | time 0[s] | loss 0.98 | epoch 267 | iter 1 / 2 | time 0[s] | loss 1.11 | epoch 268 | iter 1 / 2 | time 0[s] | loss 1.15 | epoch 269 | iter 1 / 2 | time 0[s] | loss 0.98 | epoch 270 | iter 1 / 2 | time 0[s] | loss 1.04 | epoch 271 | iter 1 / 2 | time 0[s] | loss 1.05 | epoch 272 | iter 1 / 2 | time 0[s] | loss 1.04 | epoch 273 | iter 1 / 2 | time 0[s] | loss 1.04 | epoch 274 | iter 1 / 2 | time 0[s] | loss 1.06 | epoch 275 | iter 1 / 2 | time 0[s] | loss 1.04 | epoch 276 | iter 1 / 2 | time 0[s] | loss 1.09 | epoch 277 | iter 1 / 2 | time 0[s] | loss 0.95 | epoch 278 | iter 1 / 2 | time 0[s] | loss 0.96 | epoch 279 | iter 1 / 2 | time 0[s] | loss 1.10 | epoch 280 | iter 1 / 2 | time 0[s] | loss 1.05 | epoch 281 | iter 1 / 2 | time 0[s] | loss 1.02 | epoch 282 | iter 1 / 2 | time 0[s] | loss 1.02 | epoch 283 | iter 1 / 2 | time 0[s] | loss 0.91 | epoch 284 | iter 1 / 2 | time 0[s] | loss 1.13 | epoch 285 | iter 1 / 2 | time 0[s] | loss 0.98 | epoch 286 | iter 1 / 2 | time 0[s] | loss 1.01 | epoch 287 | iter 1 / 2 | time 0[s] | loss 0.93 | epoch 288 | iter 1 / 2 | time 0[s] | loss 1.12 | epoch 289 | iter 1 / 2 | time 0[s] | loss 0.97 | epoch 290 | iter 1 / 2 | time 0[s] | loss 1.03 | epoch 291 | iter 1 / 2 | time 0[s] | loss 1.00 | epoch 292 | iter 1 / 2 | time 0[s] | loss 0.98 | epoch 293 | iter 1 / 2 | time 0[s] | loss 1.12 | epoch 294 | iter 1 / 2 | time 0[s] | loss 0.88 | epoch 295 | iter 1 / 2 | time 0[s] | loss 1.11 | epoch 296 | iter 1 / 2 | time 0[s] | loss 0.89 | epoch 297 | iter 1 / 2 | time 0[s] | loss 0.91 | epoch 298 | iter 1 / 2 | time 0[s] | loss 1.07 | epoch 299 | iter 1 / 2 | time 0[s] | loss 1.09 | epoch 300 | iter 1 / 2 | time 0[s] | loss 0.82 | epoch 301 | iter 1 / 2 | time 0[s] | loss 1.04 | epoch 302 | iter 1 / 2 | time 0[s] | loss 0.98 | epoch 303 | iter 1 / 2 | time 0[s] | loss 1.07 | epoch 304 | iter 1 / 2 | time 0[s] | loss 0.95 | epoch 305 | iter 1 / 2 | time 0[s] | loss 0.92 | epoch 306 | iter 1 / 2 | time 0[s] | loss 0.99 | epoch 307 | iter 1 / 2 | time 0[s] | loss 0.95 | epoch 308 | iter 1 / 2 | time 0[s] | loss 0.88 | epoch 309 | iter 1 / 2 | time 0[s] | loss 1.08 | epoch 310 | iter 1 / 2 | time 0[s] | loss 1.03 | epoch 311 | iter 1 / 2 | time 0[s] | loss 0.85 | epoch 312 | iter 1 / 2 | time 0[s] | loss 0.88 | epoch 313 | iter 1 / 2 | time 0[s] | loss 1.13 | epoch 314 | iter 1 / 2 | time 0[s] | loss 0.90 | epoch 315 | iter 1 / 2 | time 0[s] | loss 0.93 | epoch 316 | iter 1 / 2 | time 0[s] | loss 1.01 | epoch 317 | iter 1 / 2 | time 0[s] | loss 0.87 | epoch 318 | iter 1 / 2 | time 0[s] | loss 1.01 | epoch 319 | iter 1 / 2 | time 0[s] | loss 0.97 | epoch 320 | iter 1 / 2 | time 0[s] | loss 0.95 | epoch 321 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 322 | iter 1 / 2 | time 0[s] | loss 1.08 | epoch 323 | iter 1 / 2 | time 0[s] | loss 0.92 | epoch 324 | iter 1 / 2 | time 0[s] | loss 0.83 | epoch 325 | iter 1 / 2 | time 0[s] | loss 0.96 | epoch 326 | iter 1 / 2 | time 0[s] | loss 1.11 | epoch 327 | iter 1 / 2 | time 0[s] | loss 0.94 | epoch 328 | iter 1 / 2 | time 0[s] | loss 0.85 | epoch 329 | iter 1 / 2 | time 0[s] | loss 0.88 | epoch 330 | iter 1 / 2 | time 0[s] | loss 1.02 | epoch 331 | iter 1 / 2 | time 0[s] | loss 0.93 | epoch 332 | iter 1 / 2 | time 0[s] | loss 0.90 | epoch 333 | iter 1 / 2 | time 0[s] | loss 0.84 | epoch 334 | iter 1 / 2 | time 0[s] | loss 0.99 | epoch 335 | iter 1 / 2 | time 0[s] | loss 0.95 | epoch 336 | iter 1 / 2 | time 0[s] | loss 0.95 | epoch 337 | iter 1 / 2 | time 0[s] | loss 0.77 | epoch 338 | iter 1 / 2 | time 0[s] | loss 1.04 | epoch 339 | iter 1 / 2 | time 0[s] | loss 0.83 | epoch 340 | iter 1 / 2 | time 0[s] | loss 0.92 | epoch 341 | iter 1 / 2 | time 0[s] | loss 0.91 | epoch 342 | iter 1 / 2 | time 0[s] | loss 1.00 | epoch 343 | iter 1 / 2 | time 0[s] | loss 0.83 | epoch 344 | iter 1 / 2 | time 0[s] | loss 1.00 | epoch 345 | iter 1 / 2 | time 0[s] | loss 0.91 | epoch 346 | iter 1 / 2 | time 0[s] | loss 0.91 | epoch 347 | iter 1 / 2 | time 0[s] | loss 0.94 | epoch 348 | iter 1 / 2 | time 0[s] | loss 0.99 | epoch 349 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 350 | iter 1 / 2 | time 0[s] | loss 0.84 | epoch 351 | iter 1 / 2 | time 0[s] | loss 1.05 | epoch 352 | iter 1 / 2 | time 0[s] | loss 0.83 | epoch 353 | iter 1 / 2 | time 0[s] | loss 0.93 | epoch 354 | iter 1 / 2 | time 0[s] | loss 0.84 | epoch 355 | iter 1 / 2 | time 0[s] | loss 0.82 | epoch 356 | iter 1 / 2 | time 0[s] | loss 1.05 | epoch 357 | iter 1 / 2 | time 0[s] | loss 0.80 | epoch 358 | iter 1 / 2 | time 0[s] | loss 0.94 | epoch 359 | iter 1 / 2 | time 0[s] | loss 0.76 | epoch 360 | iter 1 / 2 | time 0[s] | loss 0.92 | epoch 361 | iter 1 / 2 | time 0[s] | loss 0.97 | epoch 362 | iter 1 / 2 | time 0[s] | loss 0.85 | epoch 363 | iter 1 / 2 | time 0[s] | loss 0.75 | epoch 364 | iter 1 / 2 | time 0[s] | loss 1.02 | epoch 365 | iter 1 / 2 | time 0[s] | loss 0.70 | epoch 366 | iter 1 / 2 | time 0[s] | loss 1.09 | epoch 367 | iter 1 / 2 | time 0[s] | loss 0.75 | epoch 368 | iter 1 / 2 | time 0[s] | loss 0.95 | epoch 369 | iter 1 / 2 | time 0[s] | loss 0.83 | epoch 370 | iter 1 / 2 | time 0[s] | loss 0.79 | epoch 371 | iter 1 / 2 | time 0[s] | loss 1.00 | epoch 372 | iter 1 / 2 | time 0[s] | loss 0.83 | epoch 373 | iter 1 / 2 | time 0[s] | loss 0.95 | epoch 374 | iter 1 / 2 | time 0[s] | loss 0.79 | epoch 375 | iter 1 / 2 | time 0[s] | loss 0.82 | epoch 376 | iter 1 / 2 | time 0[s] | loss 0.95 | epoch 377 | iter 1 / 2 | time 0[s] | loss 0.85 | epoch 378 | iter 1 / 2 | time 0[s] | loss 0.82 | epoch 379 | iter 1 / 2 | time 0[s] | loss 0.82 | epoch 380 | iter 1 / 2 | time 0[s] | loss 0.94 | epoch 381 | iter 1 / 2 | time 0[s] | loss 0.73 | epoch 382 | iter 1 / 2 | time 0[s] | loss 0.94 | epoch 383 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 384 | iter 1 / 2 | time 0[s] | loss 1.03 | epoch 385 | iter 1 / 2 | time 0[s] | loss 0.73 | epoch 386 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 387 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 388 | iter 1 / 2 | time 0[s] | loss 0.98 | epoch 389 | iter 1 / 2 | time 0[s] | loss 0.80 | epoch 390 | iter 1 / 2 | time 0[s] | loss 0.86 | epoch 391 | iter 1 / 2 | time 0[s] | loss 0.88 | epoch 392 | iter 1 / 2 | time 0[s] | loss 0.72 | epoch 393 | iter 1 / 2 | time 0[s] | loss 0.97 | epoch 394 | iter 1 / 2 | time 0[s] | loss 0.84 | epoch 395 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 396 | iter 1 / 2 | time 0[s] | loss 1.02 | epoch 397 | iter 1 / 2 | time 0[s] | loss 0.84 | epoch 398 | iter 1 / 2 | time 0[s] | loss 0.71 | epoch 399 | iter 1 / 2 | time 0[s] | loss 1.00 | epoch 400 | iter 1 / 2 | time 0[s] | loss 0.71 | epoch 401 | iter 1 / 2 | time 0[s] | loss 0.96 | epoch 402 | iter 1 / 2 | time 0[s] | loss 0.62 | epoch 403 | iter 1 / 2 | time 0[s] | loss 1.01 | epoch 404 | iter 1 / 2 | time 0[s] | loss 0.78 | epoch 405 | iter 1 / 2 | time 0[s] | loss 0.79 | epoch 406 | iter 1 / 2 | time 0[s] | loss 0.92 | epoch 407 | iter 1 / 2 | time 0[s] | loss 0.83 | epoch 408 | iter 1 / 2 | time 0[s] | loss 0.86 | epoch 409 | iter 1 / 2 | time 0[s] | loss 0.83 | epoch 410 | iter 1 / 2 | time 0[s] | loss 0.79 | epoch 411 | iter 1 / 2 | time 0[s] | loss 0.82 | epoch 412 | iter 1 / 2 | time 0[s] | loss 0.86 | epoch 413 | iter 1 / 2 | time 0[s] | loss 0.82 | epoch 414 | iter 1 / 2 | time 0[s] | loss 0.70 | epoch 415 | iter 1 / 2 | time 0[s] | loss 0.78 | epoch 416 | iter 1 / 2 | time 0[s] | loss 0.86 | epoch 417 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 418 | iter 1 / 2 | time 0[s] | loss 0.87 | epoch 419 | iter 1 / 2 | time 0[s] | loss 0.89 | epoch 420 | iter 1 / 2 | time 0[s] | loss 0.68 | epoch 421 | iter 1 / 2 | time 0[s] | loss 0.94 | epoch 422 | iter 1 / 2 | time 0[s] | loss 0.72 | epoch 423 | iter 1 / 2 | time 0[s] | loss 0.87 | epoch 424 | iter 1 / 2 | time 0[s] | loss 0.71 | epoch 425 | iter 1 / 2 | time 0[s] | loss 0.82 | epoch 426 | iter 1 / 2 | time 0[s] | loss 0.79 | epoch 427 | iter 1 / 2 | time 0[s] | loss 0.78 | epoch 428 | iter 1 / 2 | time 0[s] | loss 0.89 | epoch 429 | iter 1 / 2 | time 0[s] | loss 0.74 | epoch 430 | iter 1 / 2 | time 0[s] | loss 0.80 | epoch 431 | iter 1 / 2 | time 0[s] | loss 0.93 | epoch 432 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 433 | iter 1 / 2 | time 0[s] | loss 0.93 | epoch 434 | iter 1 / 2 | time 0[s] | loss 0.77 | epoch 435 | iter 1 / 2 | time 0[s] | loss 0.69 | epoch 436 | iter 1 / 2 | time 0[s] | loss 0.80 | epoch 437 | iter 1 / 2 | time 0[s] | loss 0.86 | epoch 438 | iter 1 / 2 | time 0[s] | loss 0.86 | epoch 439 | iter 1 / 2 | time 0[s] | loss 0.64 | epoch 440 | iter 1 / 2 | time 0[s] | loss 1.01 | epoch 441 | iter 1 / 2 | time 0[s] | loss 0.57 | epoch 442 | iter 1 / 2 | time 0[s] | loss 0.73 | epoch 443 | iter 1 / 2 | time 0[s] | loss 0.88 | epoch 444 | iter 1 / 2 | time 0[s] | loss 0.79 | epoch 445 | iter 1 / 2 | time 0[s] | loss 0.76 | epoch 446 | iter 1 / 2 | time 0[s] | loss 0.92 | epoch 447 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 448 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 449 | iter 1 / 2 | time 0[s] | loss 0.83 | epoch 450 | iter 1 / 2 | time 0[s] | loss 0.69 | epoch 451 | iter 1 / 2 | time 0[s] | loss 0.91 | epoch 452 | iter 1 / 2 | time 0[s] | loss 0.74 | epoch 453 | iter 1 / 2 | time 0[s] | loss 0.82 | epoch 454 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 455 | iter 1 / 2 | time 0[s] | loss 0.78 | epoch 456 | iter 1 / 2 | time 0[s] | loss 0.80 | epoch 457 | iter 1 / 2 | time 0[s] | loss 0.79 | epoch 458 | iter 1 / 2 | time 0[s] | loss 0.73 | epoch 459 | iter 1 / 2 | time 0[s] | loss 0.88 | epoch 460 | iter 1 / 2 | time 0[s] | loss 0.64 | epoch 461 | iter 1 / 2 | time 0[s] | loss 0.90 | epoch 462 | iter 1 / 2 | time 0[s] | loss 0.64 | epoch 463 | iter 1 / 2 | time 0[s] | loss 0.79 | epoch 464 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 465 | iter 1 / 2 | time 0[s] | loss 0.71 | epoch 466 | iter 1 / 2 | time 0[s] | loss 0.85 | epoch 467 | iter 1 / 2 | time 0[s] | loss 0.61 | epoch 468 | iter 1 / 2 | time 0[s] | loss 0.85 | epoch 469 | iter 1 / 2 | time 0[s] | loss 0.76 | epoch 470 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 471 | iter 1 / 2 | time 0[s] | loss 0.79 | epoch 472 | iter 1 / 2 | time 0[s] | loss 0.82 | epoch 473 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 474 | iter 1 / 2 | time 0[s] | loss 0.79 | epoch 475 | iter 1 / 2 | time 0[s] | loss 0.76 | epoch 476 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 477 | iter 1 / 2 | time 0[s] | loss 0.63 | epoch 478 | iter 1 / 2 | time 0[s] | loss 0.87 | epoch 479 | iter 1 / 2 | time 0[s] | loss 0.64 | epoch 480 | iter 1 / 2 | time 0[s] | loss 0.96 | epoch 481 | iter 1 / 2 | time 0[s] | loss 0.55 | epoch 482 | iter 1 / 2 | time 0[s] | loss 0.79 | epoch 483 | iter 1 / 2 | time 0[s] | loss 0.84 | epoch 484 | iter 1 / 2 | time 0[s] | loss 0.66 | epoch 485 | iter 1 / 2 | time 0[s] | loss 0.78 | epoch 486 | iter 1 / 2 | time 0[s] | loss 0.79 | epoch 487 | iter 1 / 2 | time 0[s] | loss 0.62 | epoch 488 | iter 1 / 2 | time 0[s] | loss 0.83 | epoch 489 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 490 | iter 1 / 2 | time 0[s] | loss 0.95 | epoch 491 | iter 1 / 2 | time 0[s] | loss 0.45 | epoch 492 | iter 1 / 2 | time 0[s] | loss 0.96 | epoch 493 | iter 1 / 2 | time 0[s] | loss 0.61 | epoch 494 | iter 1 / 2 | time 0[s] | loss 0.83 | epoch 495 | iter 1 / 2 | time 0[s] | loss 0.70 | epoch 496 | iter 1 / 2 | time 0[s] | loss 0.61 | epoch 497 | iter 1 / 2 | time 0[s] | loss 0.96 | epoch 498 | iter 1 / 2 | time 0[s] | loss 0.70 | epoch 499 | iter 1 / 2 | time 0[s] | loss 0.65 | epoch 500 | iter 1 / 2 | time 0[s] | loss 0.73 | epoch 501 | iter 1 / 2 | time 0[s] | loss 0.87 | epoch 502 | iter 1 / 2 | time 0[s] | loss 0.69 | epoch 503 | iter 1 / 2 | time 0[s] | loss 0.74 | epoch 504 | iter 1 / 2 | time 0[s] | loss 0.65 | epoch 505 | iter 1 / 2 | time 0[s] | loss 0.82 | epoch 506 | iter 1 / 2 | time 0[s] | loss 0.64 | epoch 507 | iter 1 / 2 | time 0[s] | loss 0.87 | epoch 508 | iter 1 / 2 | time 0[s] | loss 0.72 | epoch 509 | iter 1 / 2 | time 0[s] | loss 0.82 | epoch 510 | iter 1 / 2 | time 0[s] | loss 0.52 | epoch 511 | iter 1 / 2 | time 0[s] | loss 0.85 | epoch 512 | iter 1 / 2 | time 0[s] | loss 0.65 | epoch 513 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 514 | iter 1 / 2 | time 0[s] | loss 0.68 | epoch 515 | iter 1 / 2 | time 0[s] | loss 0.77 | epoch 516 | iter 1 / 2 | time 0[s] | loss 0.68 | epoch 517 | iter 1 / 2 | time 0[s] | loss 0.64 | epoch 518 | iter 1 / 2 | time 0[s] | loss 0.76 | epoch 519 | iter 1 / 2 | time 0[s] | loss 0.80 | epoch 520 | iter 1 / 2 | time 0[s] | loss 0.60 | epoch 521 | iter 1 / 2 | time 0[s] | loss 0.80 | epoch 522 | iter 1 / 2 | time 0[s] | loss 0.68 | epoch 523 | iter 1 / 2 | time 0[s] | loss 0.72 | epoch 524 | iter 1 / 2 | time 0[s] | loss 0.88 | epoch 525 | iter 1 / 2 | time 0[s] | loss 0.51 | epoch 526 | iter 1 / 2 | time 0[s] | loss 0.84 | epoch 527 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 528 | iter 1 / 2 | time 0[s] | loss 0.76 | epoch 529 | iter 1 / 2 | time 0[s] | loss 0.71 | epoch 530 | iter 1 / 2 | time 0[s] | loss 0.71 | epoch 531 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 532 | iter 1 / 2 | time 0[s] | loss 0.63 | epoch 533 | iter 1 / 2 | time 0[s] | loss 0.71 | epoch 534 | iter 1 / 2 | time 0[s] | loss 0.71 | epoch 535 | iter 1 / 2 | time 0[s] | loss 0.83 | epoch 536 | iter 1 / 2 | time 0[s] | loss 0.58 | epoch 537 | iter 1 / 2 | time 0[s] | loss 0.83 | epoch 538 | iter 1 / 2 | time 0[s] | loss 0.66 | epoch 539 | iter 1 / 2 | time 0[s] | loss 0.75 | epoch 540 | iter 1 / 2 | time 0[s] | loss 0.70 | epoch 541 | iter 1 / 2 | time 0[s] | loss 0.62 | epoch 542 | iter 1 / 2 | time 0[s] | loss 0.70 | epoch 543 | iter 1 / 2 | time 0[s] | loss 0.66 | epoch 544 | iter 1 / 2 | time 0[s] | loss 0.70 | epoch 545 | iter 1 / 2 | time 0[s] | loss 0.74 | epoch 546 | iter 1 / 2 | time 0[s] | loss 0.87 | epoch 547 | iter 1 / 2 | time 0[s] | loss 0.49 | epoch 548 | iter 1 / 2 | time 0[s] | loss 0.91 | epoch 549 | iter 1 / 2 | time 0[s] | loss 0.49 | epoch 550 | iter 1 / 2 | time 0[s] | loss 0.61 | epoch 551 | iter 1 / 2 | time 0[s] | loss 0.78 | epoch 552 | iter 1 / 2 | time 0[s] | loss 0.82 | epoch 553 | iter 1 / 2 | time 0[s] | loss 0.61 | epoch 554 | iter 1 / 2 | time 0[s] | loss 0.69 | epoch 555 | iter 1 / 2 | time 0[s] | loss 0.78 | epoch 556 | iter 1 / 2 | time 0[s] | loss 0.48 | epoch 557 | iter 1 / 2 | time 0[s] | loss 0.99 | epoch 558 | iter 1 / 2 | time 0[s] | loss 0.61 | epoch 559 | iter 1 / 2 | time 0[s] | loss 0.65 | epoch 560 | iter 1 / 2 | time 0[s] | loss 0.69 | epoch 561 | iter 1 / 2 | time 0[s] | loss 0.60 | epoch 562 | iter 1 / 2 | time 0[s] | loss 0.68 | epoch 563 | iter 1 / 2 | time 0[s] | loss 0.73 | epoch 564 | iter 1 / 2 | time 0[s] | loss 0.78 | epoch 565 | iter 1 / 2 | time 0[s] | loss 0.69 | epoch 566 | iter 1 / 2 | time 0[s] | loss 0.60 | epoch 567 | iter 1 / 2 | time 0[s] | loss 0.68 | epoch 568 | iter 1 / 2 | time 0[s] | loss 0.77 | epoch 569 | iter 1 / 2 | time 0[s] | loss 0.64 | epoch 570 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 571 | iter 1 / 2 | time 0[s] | loss 0.55 | epoch 572 | iter 1 / 2 | time 0[s] | loss 0.64 | epoch 573 | iter 1 / 2 | time 0[s] | loss 0.64 | epoch 574 | iter 1 / 2 | time 0[s] | loss 0.80 | epoch 575 | iter 1 / 2 | time 0[s] | loss 0.59 | epoch 576 | iter 1 / 2 | time 0[s] | loss 0.68 | epoch 577 | iter 1 / 2 | time 0[s] | loss 0.63 | epoch 578 | iter 1 / 2 | time 0[s] | loss 0.68 | epoch 579 | iter 1 / 2 | time 0[s] | loss 0.68 | epoch 580 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 581 | iter 1 / 2 | time 0[s] | loss 0.55 | epoch 582 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 583 | iter 1 / 2 | time 0[s] | loss 0.59 | epoch 584 | iter 1 / 2 | time 0[s] | loss 0.80 | epoch 585 | iter 1 / 2 | time 0[s] | loss 0.55 | epoch 586 | iter 1 / 2 | time 0[s] | loss 0.89 | epoch 587 | iter 1 / 2 | time 0[s] | loss 0.75 | epoch 588 | iter 1 / 2 | time 0[s] | loss 0.46 | epoch 589 | iter 1 / 2 | time 0[s] | loss 0.79 | epoch 590 | iter 1 / 2 | time 0[s] | loss 0.64 | epoch 591 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 592 | iter 1 / 2 | time 0[s] | loss 0.57 | epoch 593 | iter 1 / 2 | time 0[s] | loss 0.76 | epoch 594 | iter 1 / 2 | time 0[s] | loss 0.70 | epoch 595 | iter 1 / 2 | time 0[s] | loss 0.75 | epoch 596 | iter 1 / 2 | time 0[s] | loss 0.66 | epoch 597 | iter 1 / 2 | time 0[s] | loss 0.49 | epoch 598 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 599 | iter 1 / 2 | time 0[s] | loss 0.70 | epoch 600 | iter 1 / 2 | time 0[s] | loss 0.59 | epoch 601 | iter 1 / 2 | time 0[s] | loss 0.69 | epoch 602 | iter 1 / 2 | time 0[s] | loss 0.75 | epoch 603 | iter 1 / 2 | time 0[s] | loss 0.54 | epoch 604 | iter 1 / 2 | time 0[s] | loss 0.69 | epoch 605 | iter 1 / 2 | time 0[s] | loss 0.63 | epoch 606 | iter 1 / 2 | time 0[s] | loss 0.56 | epoch 607 | iter 1 / 2 | time 0[s] | loss 0.75 | epoch 608 | iter 1 / 2 | time 0[s] | loss 0.66 | epoch 609 | iter 1 / 2 | time 0[s] | loss 0.73 | epoch 610 | iter 1 / 2 | time 0[s] | loss 0.49 | epoch 611 | iter 1 / 2 | time 0[s] | loss 0.86 | epoch 612 | iter 1 / 2 | time 0[s] | loss 0.53 | epoch 613 | iter 1 / 2 | time 0[s] | loss 0.78 | epoch 614 | iter 1 / 2 | time 0[s] | loss 0.65 | epoch 615 | iter 1 / 2 | time 0[s] | loss 0.65 | epoch 616 | iter 1 / 2 | time 0[s] | loss 0.65 | epoch 617 | iter 1 / 2 | time 0[s] | loss 0.56 | epoch 618 | iter 1 / 2 | time 0[s] | loss 0.74 | epoch 619 | iter 1 / 2 | time 0[s] | loss 0.52 | epoch 620 | iter 1 / 2 | time 0[s] | loss 0.56 | epoch 621 | iter 1 / 2 | time 0[s] | loss 0.74 | epoch 622 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 623 | iter 1 / 2 | time 0[s] | loss 0.65 | epoch 624 | iter 1 / 2 | time 0[s] | loss 0.74 | epoch 625 | iter 1 / 2 | time 0[s] | loss 0.60 | epoch 626 | iter 1 / 2 | time 0[s] | loss 0.68 | epoch 627 | iter 1 / 2 | time 0[s] | loss 0.57 | epoch 628 | iter 1 / 2 | time 0[s] | loss 0.61 | epoch 629 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 630 | iter 1 / 2 | time 0[s] | loss 0.72 | epoch 631 | iter 1 / 2 | time 0[s] | loss 0.57 | epoch 632 | iter 1 / 2 | time 0[s] | loss 0.59 | epoch 633 | iter 1 / 2 | time 0[s] | loss 0.74 | epoch 634 | iter 1 / 2 | time 0[s] | loss 0.76 | epoch 635 | iter 1 / 2 | time 0[s] | loss 0.34 | epoch 636 | iter 1 / 2 | time 0[s] | loss 0.72 | epoch 637 | iter 1 / 2 | time 0[s] | loss 0.64 | epoch 638 | iter 1 / 2 | time 0[s] | loss 0.65 | epoch 639 | iter 1 / 2 | time 0[s] | loss 0.72 | epoch 640 | iter 1 / 2 | time 0[s] | loss 0.57 | epoch 641 | iter 1 / 2 | time 0[s] | loss 0.73 | epoch 642 | iter 1 / 2 | time 0[s] | loss 0.63 | epoch 643 | iter 1 / 2 | time 0[s] | loss 0.52 | epoch 644 | iter 1 / 2 | time 0[s] | loss 0.75 | epoch 645 | iter 1 / 2 | time 0[s] | loss 0.63 | epoch 646 | iter 1 / 2 | time 0[s] | loss 0.63 | epoch 647 | iter 1 / 2 | time 0[s] | loss 0.62 | epoch 648 | iter 1 / 2 | time 0[s] | loss 0.60 | epoch 649 | iter 1 / 2 | time 0[s] | loss 0.63 | epoch 650 | iter 1 / 2 | time 0[s] | loss 0.55 | epoch 651 | iter 1 / 2 | time 0[s] | loss 0.63 | epoch 652 | iter 1 / 2 | time 0[s] | loss 0.76 | epoch 653 | iter 1 / 2 | time 0[s] | loss 0.41 | epoch 654 | iter 1 / 2 | time 0[s] | loss 0.81 | epoch 655 | iter 1 / 2 | time 0[s] | loss 0.66 | epoch 656 | iter 1 / 2 | time 0[s] | loss 0.62 | epoch 657 | iter 1 / 2 | time 0[s] | loss 0.49 | epoch 658 | iter 1 / 2 | time 0[s] | loss 0.66 | epoch 659 | iter 1 / 2 | time 0[s] | loss 0.67 | epoch 660 | iter 1 / 2 | time 1[s] | loss 0.54 | epoch 661 | iter 1 / 2 | time 1[s] | loss 0.65 | epoch 662 | iter 1 / 2 | time 1[s] | loss 0.62 | epoch 663 | iter 1 / 2 | time 1[s] | loss 0.69 | epoch 664 | iter 1 / 2 | time 1[s] | loss 0.65 | epoch 665 | iter 1 / 2 | time 1[s] | loss 0.62 | epoch 666 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 667 | iter 1 / 2 | time 1[s] | loss 0.65 | epoch 668 | iter 1 / 2 | time 1[s] | loss 0.57 | epoch 669 | iter 1 / 2 | time 1[s] | loss 0.73 | epoch 670 | iter 1 / 2 | time 1[s] | loss 0.60 | epoch 671 | iter 1 / 2 | time 1[s] | loss 0.61 | epoch 672 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 673 | iter 1 / 2 | time 1[s] | loss 0.61 | epoch 674 | iter 1 / 2 | time 1[s] | loss 0.71 | epoch 675 | iter 1 / 2 | time 1[s] | loss 0.40 | epoch 676 | iter 1 / 2 | time 1[s] | loss 0.69 | epoch 677 | iter 1 / 2 | time 1[s] | loss 0.53 | epoch 678 | iter 1 / 2 | time 1[s] | loss 0.82 | epoch 679 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 680 | iter 1 / 2 | time 1[s] | loss 0.51 | epoch 681 | iter 1 / 2 | time 1[s] | loss 0.72 | epoch 682 | iter 1 / 2 | time 1[s] | loss 0.52 | epoch 683 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 684 | iter 1 / 2 | time 1[s] | loss 0.68 | epoch 685 | iter 1 / 2 | time 1[s] | loss 0.51 | epoch 686 | iter 1 / 2 | time 1[s] | loss 0.62 | epoch 687 | iter 1 / 2 | time 1[s] | loss 0.70 | epoch 688 | iter 1 / 2 | time 1[s] | loss 0.62 | epoch 689 | iter 1 / 2 | time 1[s] | loss 0.49 | epoch 690 | iter 1 / 2 | time 1[s] | loss 0.63 | epoch 691 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 692 | iter 1 / 2 | time 1[s] | loss 0.71 | epoch 693 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 694 | iter 1 / 2 | time 1[s] | loss 0.60 | epoch 695 | iter 1 / 2 | time 1[s] | loss 0.42 | epoch 696 | iter 1 / 2 | time 1[s] | loss 0.73 | epoch 697 | iter 1 / 2 | time 1[s] | loss 0.68 | epoch 698 | iter 1 / 2 | time 1[s] | loss 0.39 | epoch 699 | iter 1 / 2 | time 1[s] | loss 0.78 | epoch 700 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 701 | iter 1 / 2 | time 1[s] | loss 0.60 | epoch 702 | iter 1 / 2 | time 1[s] | loss 0.68 | epoch 703 | iter 1 / 2 | time 1[s] | loss 0.64 | epoch 704 | iter 1 / 2 | time 1[s] | loss 0.58 | epoch 705 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 706 | iter 1 / 2 | time 1[s] | loss 0.61 | epoch 707 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 708 | iter 1 / 2 | time 1[s] | loss 0.61 | epoch 709 | iter 1 / 2 | time 1[s] | loss 0.66 | epoch 710 | iter 1 / 2 | time 1[s] | loss 0.51 | epoch 711 | iter 1 / 2 | time 1[s] | loss 0.72 | epoch 712 | iter 1 / 2 | time 1[s] | loss 0.57 | epoch 713 | iter 1 / 2 | time 1[s] | loss 0.58 | epoch 714 | iter 1 / 2 | time 1[s] | loss 0.61 | epoch 715 | iter 1 / 2 | time 1[s] | loss 0.48 | epoch 716 | iter 1 / 2 | time 1[s] | loss 0.57 | epoch 717 | iter 1 / 2 | time 1[s] | loss 0.63 | epoch 718 | iter 1 / 2 | time 1[s] | loss 0.67 | epoch 719 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 720 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 721 | iter 1 / 2 | time 1[s] | loss 0.67 | epoch 722 | iter 1 / 2 | time 1[s] | loss 0.58 | epoch 723 | iter 1 / 2 | time 1[s] | loss 0.69 | epoch 724 | iter 1 / 2 | time 1[s] | loss 0.60 | epoch 725 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 726 | iter 1 / 2 | time 1[s] | loss 0.60 | epoch 727 | iter 1 / 2 | time 1[s] | loss 0.58 | epoch 728 | iter 1 / 2 | time 1[s] | loss 0.58 | epoch 729 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 730 | iter 1 / 2 | time 1[s] | loss 0.52 | epoch 731 | iter 1 / 2 | time 1[s] | loss 0.58 | epoch 732 | iter 1 / 2 | time 1[s] | loss 0.62 | epoch 733 | iter 1 / 2 | time 1[s] | loss 0.60 | epoch 734 | iter 1 / 2 | time 1[s] | loss 0.58 | epoch 735 | iter 1 / 2 | time 1[s] | loss 0.60 | epoch 736 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 737 | iter 1 / 2 | time 1[s] | loss 0.68 | epoch 738 | iter 1 / 2 | time 1[s] | loss 0.47 | epoch 739 | iter 1 / 2 | time 1[s] | loss 0.57 | epoch 740 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 741 | iter 1 / 2 | time 1[s] | loss 0.51 | epoch 742 | iter 1 / 2 | time 1[s] | loss 0.65 | epoch 743 | iter 1 / 2 | time 1[s] | loss 0.47 | epoch 744 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 745 | iter 1 / 2 | time 1[s] | loss 0.67 | epoch 746 | iter 1 / 2 | time 1[s] | loss 0.36 | epoch 747 | iter 1 / 2 | time 1[s] | loss 0.77 | epoch 748 | iter 1 / 2 | time 1[s] | loss 0.57 | epoch 749 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 750 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 751 | iter 1 / 2 | time 1[s] | loss 0.67 | epoch 752 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 753 | iter 1 / 2 | time 1[s] | loss 0.44 | epoch 754 | iter 1 / 2 | time 1[s] | loss 0.48 | epoch 755 | iter 1 / 2 | time 1[s] | loss 0.67 | epoch 756 | iter 1 / 2 | time 1[s] | loss 0.63 | epoch 757 | iter 1 / 2 | time 1[s] | loss 0.39 | epoch 758 | iter 1 / 2 | time 1[s] | loss 0.68 | epoch 759 | iter 1 / 2 | time 1[s] | loss 0.62 | epoch 760 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 761 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 762 | iter 1 / 2 | time 1[s] | loss 0.64 | epoch 763 | iter 1 / 2 | time 1[s] | loss 0.47 | epoch 764 | iter 1 / 2 | time 1[s] | loss 0.57 | epoch 765 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 766 | iter 1 / 2 | time 1[s] | loss 0.63 | epoch 767 | iter 1 / 2 | time 1[s] | loss 0.37 | epoch 768 | iter 1 / 2 | time 1[s] | loss 0.74 | epoch 769 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 770 | iter 1 / 2 | time 1[s] | loss 0.54 | epoch 771 | iter 1 / 2 | time 1[s] | loss 0.57 | epoch 772 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 773 | iter 1 / 2 | time 1[s] | loss 0.63 | epoch 774 | iter 1 / 2 | time 1[s] | loss 0.45 | epoch 775 | iter 1 / 2 | time 1[s] | loss 0.67 | epoch 776 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 777 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 778 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 779 | iter 1 / 2 | time 1[s] | loss 0.67 | epoch 780 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 781 | iter 1 / 2 | time 1[s] | loss 0.64 | epoch 782 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 783 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 784 | iter 1 / 2 | time 1[s] | loss 0.54 | epoch 785 | iter 1 / 2 | time 1[s] | loss 0.58 | epoch 786 | iter 1 / 2 | time 1[s] | loss 0.52 | epoch 787 | iter 1 / 2 | time 1[s] | loss 0.45 | epoch 788 | iter 1 / 2 | time 1[s] | loss 0.54 | epoch 789 | iter 1 / 2 | time 1[s] | loss 0.64 | epoch 790 | iter 1 / 2 | time 1[s] | loss 0.48 | epoch 791 | iter 1 / 2 | time 1[s] | loss 0.51 | epoch 792 | iter 1 / 2 | time 1[s] | loss 0.63 | epoch 793 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 794 | iter 1 / 2 | time 1[s] | loss 0.62 | epoch 795 | iter 1 / 2 | time 1[s] | loss 0.45 | epoch 796 | iter 1 / 2 | time 1[s] | loss 0.65 | epoch 797 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 798 | iter 1 / 2 | time 1[s] | loss 0.57 | epoch 799 | iter 1 / 2 | time 1[s] | loss 0.62 | epoch 800 | iter 1 / 2 | time 1[s] | loss 0.34 | epoch 801 | iter 1 / 2 | time 1[s] | loss 0.64 | epoch 802 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 803 | iter 1 / 2 | time 1[s] | loss 0.71 | epoch 804 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 805 | iter 1 / 2 | time 1[s] | loss 0.45 | epoch 806 | iter 1 / 2 | time 1[s] | loss 0.62 | epoch 807 | iter 1 / 2 | time 1[s] | loss 0.42 | epoch 808 | iter 1 / 2 | time 1[s] | loss 0.64 | epoch 809 | iter 1 / 2 | time 1[s] | loss 0.42 | epoch 810 | iter 1 / 2 | time 1[s] | loss 0.45 | epoch 811 | iter 1 / 2 | time 1[s] | loss 0.64 | epoch 812 | iter 1 / 2 | time 1[s] | loss 0.53 | epoch 813 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 814 | iter 1 / 2 | time 1[s] | loss 0.47 | epoch 815 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 816 | iter 1 / 2 | time 1[s] | loss 0.44 | epoch 817 | iter 1 / 2 | time 1[s] | loss 0.63 | epoch 818 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 819 | iter 1 / 2 | time 1[s] | loss 0.53 | epoch 820 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 821 | iter 1 / 2 | time 1[s] | loss 0.41 | epoch 822 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 823 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 824 | iter 1 / 2 | time 1[s] | loss 0.44 | epoch 825 | iter 1 / 2 | time 1[s] | loss 0.52 | epoch 826 | iter 1 / 2 | time 1[s] | loss 0.63 | epoch 827 | iter 1 / 2 | time 1[s] | loss 0.52 | epoch 828 | iter 1 / 2 | time 1[s] | loss 0.41 | epoch 829 | iter 1 / 2 | time 1[s] | loss 0.60 | epoch 830 | iter 1 / 2 | time 1[s] | loss 0.35 | epoch 831 | iter 1 / 2 | time 1[s] | loss 0.60 | epoch 832 | iter 1 / 2 | time 1[s] | loss 0.52 | epoch 833 | iter 1 / 2 | time 1[s] | loss 0.62 | epoch 834 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 835 | iter 1 / 2 | time 1[s] | loss 0.52 | epoch 836 | iter 1 / 2 | time 1[s] | loss 0.68 | epoch 837 | iter 1 / 2 | time 1[s] | loss 0.51 | epoch 838 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 839 | iter 1 / 2 | time 1[s] | loss 0.41 | epoch 840 | iter 1 / 2 | time 1[s] | loss 0.51 | epoch 841 | iter 1 / 2 | time 1[s] | loss 0.62 | epoch 842 | iter 1 / 2 | time 1[s] | loss 0.51 | epoch 843 | iter 1 / 2 | time 1[s] | loss 0.60 | epoch 844 | iter 1 / 2 | time 1[s] | loss 0.32 | epoch 845 | iter 1 / 2 | time 1[s] | loss 0.53 | epoch 846 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 847 | iter 1 / 2 | time 1[s] | loss 0.48 | epoch 848 | iter 1 / 2 | time 1[s] | loss 0.51 | epoch 849 | iter 1 / 2 | time 1[s] | loss 0.51 | epoch 850 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 851 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 852 | iter 1 / 2 | time 1[s] | loss 0.53 | epoch 853 | iter 1 / 2 | time 1[s] | loss 0.48 | epoch 854 | iter 1 / 2 | time 1[s] | loss 0.61 | epoch 855 | iter 1 / 2 | time 1[s] | loss 0.42 | epoch 856 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 857 | iter 1 / 2 | time 1[s] | loss 0.67 | epoch 858 | iter 1 / 2 | time 1[s] | loss 0.23 | epoch 859 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 860 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 861 | iter 1 / 2 | time 1[s] | loss 0.61 | epoch 862 | iter 1 / 2 | time 1[s] | loss 0.41 | epoch 863 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 864 | iter 1 / 2 | time 1[s] | loss 0.41 | epoch 865 | iter 1 / 2 | time 1[s] | loss 0.60 | epoch 866 | iter 1 / 2 | time 1[s] | loss 0.39 | epoch 867 | iter 1 / 2 | time 1[s] | loss 0.69 | epoch 868 | iter 1 / 2 | time 1[s] | loss 0.31 | epoch 869 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 870 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 871 | iter 1 / 2 | time 1[s] | loss 0.31 | epoch 872 | iter 1 / 2 | time 1[s] | loss 0.66 | epoch 873 | iter 1 / 2 | time 1[s] | loss 0.41 | epoch 874 | iter 1 / 2 | time 1[s] | loss 0.53 | epoch 875 | iter 1 / 2 | time 1[s] | loss 0.57 | epoch 876 | iter 1 / 2 | time 1[s] | loss 0.41 | epoch 877 | iter 1 / 2 | time 1[s] | loss 0.58 | epoch 878 | iter 1 / 2 | time 1[s] | loss 0.39 | epoch 879 | iter 1 / 2 | time 1[s] | loss 0.40 | epoch 880 | iter 1 / 2 | time 1[s] | loss 0.67 | epoch 881 | iter 1 / 2 | time 1[s] | loss 0.39 | epoch 882 | iter 1 / 2 | time 1[s] | loss 0.51 | epoch 883 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 884 | iter 1 / 2 | time 1[s] | loss 0.48 | epoch 885 | iter 1 / 2 | time 1[s] | loss 0.58 | epoch 886 | iter 1 / 2 | time 1[s] | loss 0.40 | epoch 887 | iter 1 / 2 | time 1[s] | loss 0.48 | epoch 888 | iter 1 / 2 | time 1[s] | loss 0.39 | epoch 889 | iter 1 / 2 | time 1[s] | loss 0.58 | epoch 890 | iter 1 / 2 | time 1[s] | loss 0.37 | epoch 891 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 892 | iter 1 / 2 | time 1[s] | loss 0.40 | epoch 893 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 894 | iter 1 / 2 | time 1[s] | loss 0.42 | epoch 895 | iter 1 / 2 | time 1[s] | loss 0.57 | epoch 896 | iter 1 / 2 | time 1[s] | loss 0.37 | epoch 897 | iter 1 / 2 | time 1[s] | loss 0.61 | epoch 898 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 899 | iter 1 / 2 | time 1[s] | loss 0.48 | epoch 900 | iter 1 / 2 | time 1[s] | loss 0.57 | epoch 901 | iter 1 / 2 | time 1[s] | loss 0.39 | epoch 902 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 903 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 904 | iter 1 / 2 | time 1[s] | loss 0.37 | epoch 905 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 906 | iter 1 / 2 | time 1[s] | loss 0.58 | epoch 907 | iter 1 / 2 | time 1[s] | loss 0.47 | epoch 908 | iter 1 / 2 | time 1[s] | loss 0.32 | epoch 909 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 910 | iter 1 / 2 | time 1[s] | loss 0.38 | epoch 911 | iter 1 / 2 | time 1[s] | loss 0.54 | epoch 912 | iter 1 / 2 | time 1[s] | loss 0.38 | epoch 913 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 914 | iter 1 / 2 | time 1[s] | loss 0.47 | epoch 915 | iter 1 / 2 | time 1[s] | loss 0.47 | epoch 916 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 917 | iter 1 / 2 | time 1[s] | loss 0.29 | epoch 918 | iter 1 / 2 | time 1[s] | loss 0.47 | epoch 919 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 920 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 921 | iter 1 / 2 | time 1[s] | loss 0.38 | epoch 922 | iter 1 / 2 | time 1[s] | loss 0.38 | epoch 923 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 924 | iter 1 / 2 | time 1[s] | loss 0.40 | epoch 925 | iter 1 / 2 | time 1[s] | loss 0.44 | epoch 926 | iter 1 / 2 | time 1[s] | loss 0.64 | epoch 927 | iter 1 / 2 | time 1[s] | loss 0.26 | epoch 928 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 929 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 930 | iter 1 / 2 | time 1[s] | loss 0.57 | epoch 931 | iter 1 / 2 | time 1[s] | loss 0.38 | epoch 932 | iter 1 / 2 | time 1[s] | loss 0.37 | epoch 933 | iter 1 / 2 | time 1[s] | loss 0.55 | epoch 934 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 935 | iter 1 / 2 | time 1[s] | loss 0.48 | epoch 936 | iter 1 / 2 | time 1[s] | loss 0.53 | epoch 937 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 938 | iter 1 / 2 | time 1[s] | loss 0.37 | epoch 939 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 940 | iter 1 / 2 | time 1[s] | loss 0.54 | epoch 941 | iter 1 / 2 | time 1[s] | loss 0.48 | epoch 942 | iter 1 / 2 | time 1[s] | loss 0.48 | epoch 943 | iter 1 / 2 | time 1[s] | loss 0.42 | epoch 944 | iter 1 / 2 | time 1[s] | loss 0.37 | epoch 945 | iter 1 / 2 | time 1[s] | loss 0.45 | epoch 946 | iter 1 / 2 | time 1[s] | loss 0.45 | epoch 947 | iter 1 / 2 | time 1[s] | loss 0.53 | epoch 948 | iter 1 / 2 | time 1[s] | loss 0.34 | epoch 949 | iter 1 / 2 | time 1[s] | loss 0.56 | epoch 950 | iter 1 / 2 | time 1[s] | loss 0.45 | epoch 951 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 952 | iter 1 / 2 | time 1[s] | loss 0.35 | epoch 953 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 954 | iter 1 / 2 | time 1[s] | loss 0.44 | epoch 955 | iter 1 / 2 | time 1[s] | loss 0.45 | epoch 956 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 957 | iter 1 / 2 | time 1[s] | loss 0.41 | epoch 958 | iter 1 / 2 | time 1[s] | loss 0.59 | epoch 959 | iter 1 / 2 | time 1[s] | loss 0.33 | epoch 960 | iter 1 / 2 | time 1[s] | loss 0.54 | epoch 961 | iter 1 / 2 | time 1[s] | loss 0.41 | epoch 962 | iter 1 / 2 | time 1[s] | loss 0.35 | epoch 963 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 964 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 965 | iter 1 / 2 | time 1[s] | loss 0.48 | epoch 966 | iter 1 / 2 | time 1[s] | loss 0.35 | epoch 967 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 968 | iter 1 / 2 | time 1[s] | loss 0.44 | epoch 969 | iter 1 / 2 | time 1[s] | loss 0.38 | epoch 970 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 971 | iter 1 / 2 | time 1[s] | loss 0.27 | epoch 972 | iter 1 / 2 | time 1[s] | loss 0.44 | epoch 973 | iter 1 / 2 | time 1[s] | loss 0.53 | epoch 974 | iter 1 / 2 | time 1[s] | loss 0.35 | epoch 975 | iter 1 / 2 | time 1[s] | loss 0.53 | epoch 976 | iter 1 / 2 | time 1[s] | loss 0.44 | epoch 977 | iter 1 / 2 | time 1[s] | loss 0.46 | epoch 978 | iter 1 / 2 | time 1[s] | loss 0.42 | epoch 979 | iter 1 / 2 | time 1[s] | loss 0.40 | epoch 980 | iter 1 / 2 | time 1[s] | loss 0.49 | epoch 981 | iter 1 / 2 | time 1[s] | loss 0.53 | epoch 982 | iter 1 / 2 | time 1[s] | loss 0.32 | epoch 983 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 984 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 985 | iter 1 / 2 | time 1[s] | loss 0.50 | epoch 986 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 987 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 988 | iter 1 / 2 | time 1[s] | loss 0.41 | epoch 989 | iter 1 / 2 | time 1[s] | loss 0.45 | epoch 990 | iter 1 / 2 | time 1[s] | loss 0.36 | epoch 991 | iter 1 / 2 | time 1[s] | loss 0.39 | epoch 992 | iter 1 / 2 | time 1[s] | loss 0.47 | epoch 993 | iter 1 / 2 | time 1[s] | loss 0.45 | epoch 994 | iter 1 / 2 | time 1[s] | loss 0.41 | epoch 995 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 996 | iter 1 / 2 | time 1[s] | loss 0.43 | epoch 997 | iter 1 / 2 | time 1[s] | loss 0.48 | epoch 998 | iter 1 / 2 | time 1[s] | loss 0.35 | epoch 999 | iter 1 / 2 | time 1[s] | loss 0.52 | epoch 1000 | iter 1 / 2 | time 1[s] | loss 0.26
損失値をプロットしてみる
%python trainer.plot() z.show(plt, fmt='svg')
ベクトルを表示してみる
%python word_vecs = model.word_vecs for word_id, word in id_to_word.items(): print(word, word_vecs[word_id])
you [ 0.93710303 0.93910193 1.7272372 -0.89610606 1.0445951 ] say [-1.1644877 -1.2109934 -0.20577171 1.23597 -1.2464908 ] goodbye [ 1.1030452 1.0522411 -0.1555654 -1.0932515 0.8510445] and [-0.77724737 -1.0205745 -1.8217171 0.9609459 -1.050846 ] i [ 1.1081636 1.0668204 -0.13783155 -1.1415119 0.8612589 ] hello [ 0.9385245 0.9236376 1.7012237 -0.9081088 1.0232164] . [-1.1489888 -1.061469 1.6251746 1.10383 -1.1069291]
- 単語を密なベクトルで表現できたが、小さなコーパスでは良い結果は得られない
- 今回の実装では大きなコーパスを処理することはできない
CBOW モデル: 補足
確率の表記
- \( P(A) \): A という事象が起こる確率
- \( P(A, B) \): A と B が同時に起こる確率
- \( P(A|B) \): B が起きたあとに A が起こる確率( 事後確率 )
CBOW モデルと確率
$$
P(w_t | w_{t-1},w_{t+1})
$$
\(上の式は w_{t-1} と w_{t+1} が起こった後に w_t が起こる確率を表す\)
交差エントロピー誤差
$$
L = -\ log\ P(w_t|w_{t-1},w_{t+1})
$$
コーパス全体に拡張すると
$$
L = - \frac{1}{T} \sum^{T}_{t=1} log\ P(w_t|w_{t-1},w_{t+1})
$$
学習するときはこの損失関数を出来る限り小さくする
- “Efficient Estimation of Word Representations in Vector Space”
- “Distributed Representations of Words and Phrases and their Compositionality”
%md