¤Þ¤È¤áÃæ
\(u = w1 x1 + w2 x2 + w3 x3 + w4 x4 + b \\ z = f(u) \)
\(\left( \begin{array}{c} u_{1} \\ u_{2} \\ u_{3} \end{array} \right) = \left( \begin{array}{c} w_{11} & w_{12} & w_{13} & w_{14} \\ w_{21} & w_{22} & w_{23} & w_{24} \\ w_{31} & w_{32} & w_{33} & w_{34} \end{array} \right) \left( \begin{array}{c} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \end{array} \right) + \left( \begin{array}{c} b \\ b \\ b \end{array} \right) \\ \left( \begin{array}{c} z_{1} \\ z_{2} \\ z_{3} \end{array} \right) = \left( \begin{array}{c} f(u_{1}) \\ f(u_{2}) \\ f(u_{3}) \end{array} \right) \\ \)
\(\mathsf{u} = \mathsf{W} \mathsf{x} + \mathsf{b} \\ \mathsf{z} = f(\mathsf{u}) \)
l=1 | ÆþÎÏÁØ(input layer) |
l=2 | Ãæ´ÖÁØ(internal layer), ±£¤ìÁØ(hidden layer) |
l=3 | ½ÐÎÏÁØ(output layer) |
\(\mathsf{u}^{(l+1)} = \mathsf{W}^{(l+1)} \mathsf{z}^{(l)} + \mathsf{b}^{(l+1)} \\ \mathsf{z}^{(l+1)} = f(\mathsf{u}^{(l+1)}) \\ \\ (¤¿¤À¤· \mathsf{z}^{(0)} = \mathsf{x}) \)
\(\mathsf{y} = y(\mathsf{x}; \mathsf{W}^{(2)}, ... , \mathsf{W}^{(L)}, \mathsf{b}^{(2)}, ... , \mathsf{b}^{(L)}) \\ \mathsf{y} = y(\mathsf{x}; \mathsf{W}) \)
¤È¤â½ñ¤¯\(f(u) = \frac{1}{1+e^{-u}} \\ \)
\(f(u) = tanh(u) \\ \)
\(f(u) =\begin{cases}-1 & u < -1\\u & -1 \leq u < 1\\ 1 & 1 \leq u\end{cases} \)
\(f(u) = max(0,u) = \begin{cases}0 & u < 0\\u & 0 \leq u\end{cases} \)
\(f'(u) = \begin{cases}1 & u \geq 0\\0 & u < 0\end{cases} \)
\(u_{jk} = \sum_i \big\{w_{jik} x_{i} + b_{jk}\big\} \\ z_{j} = f(u_{j}) = max_{k=1,...,K} (u_{jk}) \)
\(\mathsf{y} = y(\mathsf{x}; \mathsf{W}) \)
¤ÎºÇŬ¤Ê "W" ¤ò³Ø½¬¤·¡¢ÆþÎÏ x ¤ËÂФ·¤ÆŬÀڤʽÐÎÏ y ¤¬ÆÀ¤é¤ì¤ë¤è¤¦¤Ë¤·¤¿¤¤\((x,d) \)
\((x_n,d_n)=\big\{(x_1,d_1),(x_2,d_2),...,(x_n,d_n)\big\} \)
\(d_n \approx y(x_n;W) \)
\(E(W) = \frac{1}{2} \sum_{n=1}^{N} \| d_n - y(x_n;W) \|^2 \)
¢¨ 1/2 ¤Ï¡¢Èùʬ¤·¤¿»þ¤Ë·¸¿ô¤¬½Ð¤Æ¤³¤Ê¤¤¤è¤¦¤Ë¤¹¤ë¤¿¤á\(p(d=1|x) = y(x;W) \\ p(d=0|x) = 1 - y(x;W) \\ \therefore p(d|x) = p(d=1|x)^{d}p(d=0|x)^{1-d} \)
\(L(W) = \prod_{n=1}^{N} p(d_{n}|x_{n}) = \prod_{n=1}^{N} p(d=1|x)^{d_{n}}p(d=0|x)^{1-d_{n}} = \prod_{n=1}^{N} y(x;W)^{d_{n}}(1-y(x;W))^{1-d_{n}} \)
\(E(W) = -log(L(W)) = - \sum_{n=1}^N \big\{ d_{n} log(y(x_{n};W)) + (1 - d_{n}) log(1-y(x_{n};W)) \big\} \)
\(y=\frac{1}{1+e^{-u}} \\ d=\begin{cases}0 & y < 0.5\\1 & 0.5 \leq y\end{cases} \)
\(y_{k} = z_{k}^{(L)} = \frac{exp(u_{k}^{(L)})}{ \sum_{j=1}^{K} exp(u_{j}^{(L)}) } \)
softmax function ¤Ï¡¢¹ç·×¤¬ 1 ¤Ë¤Ê¤ë ¢Í ¤³¤ì¤òÆþÎϲèÁü x ¤¬Í¿¤¨¤é¤ì¤¿»þ¤Î y0〜y9 ¤Î³ÎΨʬÉۤȸ«¤Ê¤¹\(d_{n} = [0\ 0\ 1\ 0\ 0\ 0\ 0\ 0\ 0]^{T} \)
¤È¤Ê¤ë¤Î¤¬ÍýÁÛ\(L(W) = \prod_{n=1}^N p(d_{n}|x_{n}) = \prod_{n=1}^N \prod_{k=1}^K (y_{k}(x;W))^{d_{nk}} \)
\(E(W) = -log(L(W)) = - \sum_{n=1}^N \sum_{k=1}^K d_{nk} log(y_{k}(x_{n};W)) \)
¤³¤Î´Ø¿ô¤Ï¸òº¹¥¨¥ó¥È¥í¥Ô¡¼ (cross entropy)\(\bigtriangledown E \equiv \frac{\partial E}{\partial w} = \begin{bmatrix} \frac{\partial E}{\partial w_{1}} & ... & \frac{\partial E}{\partial w_{m}} \end{bmatrix}^T \\ w^{(t+1)} = w^{(t)} - \varepsilon \bigtriangledown E \)
\(E_{t}(w) = \frac{1}{N_{t}} \sum_{n \in D_{t}} E_{n}(w) \\ \\ D_1 = \big\{(x_{931},d_{931}),(x_{81},d_{81}),...(x_{233},d_{233})\big\} \\ D_2 = \big\{(x_{5},d_{5}),(x_{534},d_{534}),...(x_{111},d_{111})\big\} \\ D_3 = \big\{(x_{432},d_{432}),(x_{68},d_{68}),...(x_{134},d_{134})\big\} \\ ... \)
\(E_{t}(w) = \frac{1}{N_{t}} \sum_{n \in D_{t}} E_{n}(w) + \frac{ \lambda }{2} \|w\|^{2} \\ \lambda = 0.01 〜 0.00001 \)
¸íº¹´Ø¿ô¤Ë¡¢½Å¤ß¤ÎÆó¾èϤιà¤òÄɲ乤ë
¢ª ¸íº¹´Ø¿ô¤¬¾®¤µ¤¯¤Ê¤ë¤è¤¦¤Ë³Ø½¬¤ò¿Ê¤á¤ë¤Î¤Ç¡¢¤è¤ê¾®¤µ¤Ê w ¤¬Áª¹¥¤µ¤ì¤ë
\(\sum_i w_{ji}^2 < c \)
³Æ j Áؤˤª¤¤¤Æ¡¢½Å¤ß¤ÎÆó¾èϤ¬¡¢Äê¿ô c ¤è¤ê¾®¤µ¤¯¤Ê¤ë¤è¤¦¤Ë¤¹¤ë¡£
while ( square_sum(w[j]) > c ) { for (i = 0; i < size; i++) { w[j][i] = w[j][i] * 0.99; } }
X(1)=(²¹ÅÙ1, ¼¾ÅÙ1, °µÎÏ1) X(2)=(²¹ÅÙ2, ¼¾ÅÙ2, °µÎÏ2) X(3)=(²¹ÅÙ3, ¼¾ÅÙ3, °µÎÏ3) ...¤Ê¤é¡¢³Ø½¬¤ËÍѤ¤¤ë¥Ç¡¼¥¿¤Ï
I(1)=(²¹ÅÙ1¤Îɸ½à²½¥¹¥³¥¢, ¼¾ÅÙ1¤Îɸ½à²½¥¹¥³¥¢, °µÎÏ1¤Îɸ½à²½¥¹¥³¥¢) I(2)=(²¹ÅÙ2¤Îɸ½à²½¥¹¥³¥¢, ¼¾ÅÙ2¤Îɸ½à²½¥¹¥³¥¢, °µÎÏ2¤Îɸ½à²½¥¹¥³¥¢) I(3)=(²¹ÅÙ3¤Îɸ½à²½¥¹¥³¥¢, ¼¾ÅÙ3¤Îɸ½à²½¥¹¥³¥¢, °µÎÏ3¤Îɸ½à²½¥¹¥³¥¢) ...
\(x_{ni} \leftarrow \frac{x_{ni} - \overline{x_{i}} }{ \sigma_{i} } \\ ¤¿¤À¤· \sigma_{i} = \sqrt{\frac{1}{N} \sum_{n=1}^N (x_{ni} - \overline{x_{i}}} ) ^ 2 \)
¢ª ³ÆÀ®Ê¬¤ò Ê¿¶Ñ 0¡¢Ê¬»¶ 1 ¤Ë¤Ê¤ë¤è¤¦¤Ë²Ã¹©¤¹¤ë\(x_{ni} \leftarrow \frac{x_{ni} - \overline{x_{i}} }{ max(\sigma_{i},\epsilon) } \\ (¦Å¤Ï¾®¤µ¤ÊÃÍ) \)
¤¬¤è¤¯»È¤ï¤ì¤ë\(w^{(t+1)} = w^{(t)} - \varepsilon \bigtriangledown E \)
\(\epsilon_{t} = \epsilon_{0} - \alpha t \)
\(\epsilon = \epsilon / 10 \)
\(w^{(t+1)} = w^{(t)} - \varepsilon \bigtriangledown E \)
\(\bigtriangledown E ¤Î i À®Ê¬¤ò g_{ti} ¤È¤·¤Æ \\ w^{(t+1)}_{i} = w^{(t)}_{i} - \frac{\epsilon}{ \sqrt{ \sum_{t'=1}^t g_{t'i}^2 } } g_{ti} \)
¢ª ¤½¤Î³Ø½¬¤Ç¡¢Ê¿¶Ñ¤«¤é³°¤ì¤Æ Îɤ¤/°¤¤ Ä´À°¤ò¹Ô¤Ã¤¿ \(w_{i}\) ¤ò½Å»ë¤·¤Æ½Å¤ß¤Î¹¹¿·¤ò¹Ô¤¦\(w^{(t+1)} = w^{(t)} - \varepsilon \bigtriangledown E + \mu \Delta w^{(t-1)} \)
¦Ì=0.9 ¤Î¤È¤¡¢³Ø½¬¤¬ 10 ÇÜÁ᤯¿Ê¤à¡£ºÇŬ²ò¤òÄ̤ê±Û¤·¤Á¤ã¤¦¤«¤â¤·¤ì¤ó¤±¤É\(u^{(l+1)} = Wz^{(l)} + b^{(l)} \\ z^{(l+1)} = f(u^{(l+1)}) \)
\(\begin{bmatrix}u^{(l+1)}_1\\ \vdots \\ u^{(l+1)}_J \end{bmatrix} = \begin{bmatrix}w^{(l)}_{11} & \cdots & w^{(l)}_{1I} \\ \vdots & \ddots & \vdots \\ w^{(l)}_{J1} & \cdots & w^{(l)}_{JI} \end{bmatrix} \begin{bmatrix}z^{(l)}_1\\ \vdots \\ z^{(l)}_I \end{bmatrix} + \begin{bmatrix}b^{(l)}_1\\ \vdots \\ b^{(l)}_J \end{bmatrix} \\ \begin{bmatrix}z^{(l+1)}_1\\ \vdots \\ z^{(l+1)}_J \end{bmatrix} = \begin{bmatrix}f(u^{(l+1)}_1)\\ \vdots \\ f(u^{(l+1)}_J) \end{bmatrix} \)
\(u^{(l+1)} = Wz^{(l)} \\ z^{(l+1)} = f(u^{(l+1)}) \)
\(\begin{bmatrix}u^{(l+1)}_1\\ \vdots \\ u^{(l+1)}_J \end{bmatrix} = \begin{bmatrix}b^{(l)}_{1} & w^{(l)}_{11} & \cdots & w^{(l)}_{1I} \\ \vdots & \vdots & \ddots & \vdots \\ b^{(l)}_{J} & w^{(l)}_{J1} & \cdots & w^{(l)}_{JI} \end{bmatrix} \begin{bmatrix}+1 \\ z^{(l)}_1\\ \vdots \\ z^{(l)}_I \end{bmatrix} \\ \begin{bmatrix}z^{(l+1)}_1\\ \vdots \\ z^{(l+1)}_J \end{bmatrix} = \begin{bmatrix}f(u^{(l+1)}_1)\\ \vdots \\ f(u^{(l+1)}_J) \end{bmatrix} \)
\(\frac{\partial E}{\partial U^{(l)}} = \frac{\partial E}{\partial Z^{(l)}} \frac{\partial Z^{(l)}}{\partial U^{(l)}} \\ = \frac{\partial E}{\partial U^{(l+1)}} \frac{\partial U^{(l+1)}}{\partial Z^{(l)}} \frac{\partial Z^{(l)}}{\partial U^{(l)}} \\ = \frac{\partial E}{\partial U^{(l+1)}} \frac{\partial (W^{(l+1)} Z^{(l)})}{\partial Z^{(l)}} \frac{\partial f(U^{(l)})}{\partial U^{(l)}} \\ = \frac{\partial E}{\partial U^{(l+1)}} W^{(l+1)} f'(U^{(l)}) \)
¤³¤³¤Ç¡¢\(\delta_{l} = \frac{\partial E}{\partial U^{(l)}}\) ¤È¤¹¤ë¤È\(\delta_{l} = \delta_{l+1} W^{(l+1)} f'(U^{(l)}) \)
¤Ä¤Þ¤ê¡¢¥Í¥Ã¥È¥ï¡¼¥¯¤Î½ÐÎÏÁؤΠ\(\delta_{L}\) ¤¬µá¤Þ¤ì¤Ð¡¢½ÐÎÏÁؤ«¤éÆþÎÏÁؤ˸þ¤«¤Ã¤Æ½çÈÖ¤Ë \(\delta_{l}\) ¤¬µá¤Þ¤ë¡£¢Í ¤³¤ÎÁàºî¤¬¡¢¸íº¹"µÕ"ÅÁÇÅ\(\frac{\partial E}{\partial W^{(l)}} = \frac{\partial E}{\partial U^{(l)}} \frac{\partial U^{(l)}}{\partial W^{(l)}} \\ = \frac{\partial E}{\partial U^{(l)}} \frac{\partial (W^{(l)} Z^{(l-1)})}{\partial W^{l}} \\ = \frac{\partial E}{\partial U^{(l)}} Z^{(l-1)} \\ = \delta_{l} Z^{(l-1)} \)
\(u^{(l+1)} = W^{(l+1)} z^{(l)} \\ z^{(l+1)} = f(u^{(l+1)}) \)
\(\begin{bmatrix}u^{(l+1)}_1\\ \vdots \\ u^{(l+1)}_J \end{bmatrix} = \begin{bmatrix}b^{(l)}_{1} & w^{(l)}_{11} & \cdots & w^{(l)}_{1I} \\ \vdots & \vdots & \ddots & \vdots \\ b^{(l)}_{J} & w^{(l)}_{J1} & \cdots & w^{(l)}_{JI} \end{bmatrix} \begin{bmatrix}+1 \\ z^{(l)}_1\\ \vdots \\ z^{(l)}_I \end{bmatrix} \\ \begin{bmatrix}z^{(l+1)}_1\\ \vdots \\ z^{(l+1)}_J \end{bmatrix} = \begin{bmatrix}f(u^{(l+1)}_1)\\ \vdots \\ f(u^{(l+1)}_J) \end{bmatrix} \)
\(\delta^{(l)} = f'(u^{(l)}) \odot (W^{(l+1)T} \delta^{(l+1)}) \)
\(\begin{bmatrix}\delta^{(l)}_0 \\ \delta^{(l)}_1 \\ \vdots \\ \delta^{(l)}_I \end{bmatrix} = \begin{bmatrix}+1 \\ f'(u^{(l)}_1) \\ \vdots \\ f'(u^{(l)}_I) \end{bmatrix} \odot \big( \begin{bmatrix}b^{(l+1)}_1 & \cdots & b^{(l+1)}_J \\ w^{(l+1)}_{11} & \cdots & w^{(l+1)}_{J1} \\ \vdots & \ddots & \vdots \\ w^{(l+1)}_{1I} & \cdots & w^{(l+1)}_{JI} \end{bmatrix} \begin{bmatrix}\delta^{(l+1)}_1 \\ \vdots \\ \delta^{(l+1)}_J \end{bmatrix} \big) \)
\(¢¨ C \equiv A \odot B ¢Î c_{ij} = a_{ij} b_{ij} \)
\(\frac{\partial E_n}{\partial W^{(l)}} = \delta^{(l)} z^{(l-1)T} \)
\(\begin{bmatrix}b^{(l)}_{1} & w^{(l)}_{11} & \cdots & w^{(l)}_{1I} \\ \vdots & \vdots & \ddots & \vdots \\ b^{(l)}_{J} & w^{(l)}_{J1} & \cdots & w^{(l)}_{JI} \end{bmatrix} = \begin{bmatrix}\delta^{(l)}_0 \\ \delta^{(l)}_1 \\ \vdots \\ \delta^{(l)}_J \end{bmatrix} \begin{bmatrix}z^{(l-1)}_0 z^{(l-1)}_1 \cdots z^{(l-1)}_I \end{bmatrix} = \begin{bmatrix}\delta^{(l)}_0 \\ \delta^{(l)}_1 \\ \vdots \\ \delta^{(l)}_J \end{bmatrix} \begin{bmatrix}+1~z^{(l-1)}_1 \cdots z^{(l-1)}_I \end{bmatrix} \)
\(\Delta^{(l)} = f'(U^{(l)}) \odot (W^{(l+1)T} \Delta^{(l+1)}) \)
\(\begin{bmatrix}\delta^{(l)}_{01} & \cdots & \delta^{(l)}_{0N} \\ \delta^{(l)}_{11} & \cdots & \delta^{(l)}_{1N} \\ \vdots & \ddots & \vdots \\ \delta^{(l)}_{I1} & \cdots & \delta^{(l)}_{IN} \end{bmatrix} = \begin{bmatrix}+1 & \cdots & +1 \\ f'(u^{(l)}_{11}) & \cdots & f'(u^{(l)}_{1N}) \\ \vdots & \ddots & \vdots \\ f'(u^{(l)}_{I1}) & \cdots & f'(u^{(l)}_{IN}) \end{bmatrix} \odot \big( \begin{bmatrix}b^{(l+1)}_1 & \cdots & b^{(l+1)}_J \\ w^{(l+1)}_{11} & \cdots & w^{(l+1)}_{J1} \\ \vdots & \ddots & \vdots \\ w^{(l+1)}_{1I} & \cdots & w^{(l+1)}_{JI} \end{bmatrix} \begin{bmatrix}\delta^{(l+1)}_{11} & \cdots & \delta^{(l+1)}_{1N} \\ \vdots & \ddots & \vdots \\ \delta^{(l+1)}_{J1} & \cdots & \delta^{(l+1)}_{JN} \end{bmatrix} \big) \)
\(\frac{\partial E}{\partial W^{(l)}} = \frac{1}{N} \Delta^{(l)} Z^{(l-1)T} \)
\(\begin{bmatrix}b^{(l)}_{1} & w^{(l)}_{11} & \cdots & w^{(l)}_{1I} \\ \vdots & \vdots & \ddots & \vdots \\ b^{(l)}_{J} & w^{(l)}_{J1} & \cdots & w^{(l)}_{JI} \end{bmatrix} = \frac{1}{N} \begin{bmatrix}\delta^{(l)}_{01} & \cdots & \delta^{(l)}_{0N} \\ \delta^{(l)}_{11} & \cdots & \delta^{(l)}_{1N} \\ \vdots & \ddots & \vdots \\ \delta^{(l)}_{J1} & \cdots & \delta^{(l)}_{JN} \end{bmatrix} \begin{bmatrix} +1&z^{(l-1)}_{11} & \cdots & z^{(l-1)}_{I1}\\ +1&z^{(l-1)}_{12} & \cdots & z^{(l-1)}_{I2}\\ \vdots & \vdots & \ddots & \vdots \\ +1&z^{(l-1)}_{1N} & \cdots & z^{(l-1)}_{IN}\\ \end{bmatrix} \)
\(f(u) = \frac{1}{1+e^{-u}} \\ f'(u) = f(u)(1-f(u)) \)
\(f(u) = tanh(u) \\ f'(u) = 1-tanh^2(u) \)
\(f(u) = max(u,0) \\ f'(u) = \begin{cases}1 & u \geq 0\\0 & u < 0\end{cases} \)
\(Y = U = Z \)
\(E(W) = \frac{1}{2} \sum_{n=1}^{N} \| d_n - y(x_n;W) \|^2 \)
¢¨ 1/2 ¤Ï¡¢Èùʬ¤·¤¿»þ¤Ë·¸¿ô¤¬½Ð¤Æ¤³¤Ê¤¤¤è¤¦¤Ë¤¹¤ë¤¿¤á\(\delta^{(L)} = Y - D \)
\(y=\frac{1}{1+e^{-u}} \\ d=\begin{cases}0 & y < 0.5\\1 & 0.5 \leq y\end{cases} \)
\(E(W) = -log(L(W)) = - \sum_{n=1}^N \big\{ d_{n} log(y(x_{n};W)) + (1 - d_{n}) log(1-y(x_{n};W)) \big\} \)
\(\delta^{(L)} = Y - D \)
¢¨ ÀÄËÜ¤Ç¤Ï D-Y ¤È¤Ê¤Ã¤Æ¤¤¤ë¤±¤É¿ʬ¸í¿¢\(y_{k} = z_{k}^{(L)} = \frac{exp(u_{k}^{(L)})}{ \sum_{j=1}^{K} exp(u_{j}^{(L)}) } \)
softmax function ¤Ï¡¢¹ç·×¤¬ 1 ¤Ë¤Ê¤ë ¢Í ¤³¤ì¤òÆþÎϲèÁü x ¤¬Í¿¤¨¤é¤ì¤¿»þ¤Î y0〜y9 ¤Î³ÎΨʬÉۤȸ«¤Ê¤¹\(E(W) = -log(L(W)) = - \sum_{n=1}^N \sum_{k=1}^K d_{nk} log(y_{k}(x_{n};W)) \)
\(\delta^{(L)} = Y - D \)
\(\frac{\partial E}{\partial w_{ji}} \approx \frac{E(W') - E(W)}{\epsilon}\\ \epsilon = \begin{cases} \epsilon_c |w_{ji}|\\ \epsilon_c & \epsilon_c |w_{ji}| ¤¬·×»»¤Ë»È¤¦ÉâÆ°¾®¿ôÅÀ·Á¼°¤Ç 0 ¤Î¤È¤ \end{cases} \)
\(f(u) = \frac{1}{1+e^{-u}} \\ f'(u) = f(u)(1-f(u)) \)
ÈùʬÃÍ(diff logistic)¤ÎºÇÂçÃÍ¤Ï u=0 ¤Î¤È¤ f'(0)=0.25\(f(u) = tanh(u) \\ f'(u) = 1-tanh^2(u) \)
ÈùʬÃÍ(diff hyperbolic)¤ÎºÇÂçÃÍ¤Ï u=0 ¤Î¤È¤ f'(0)=1\(f(u) = max(u,0) \\ f'(u) = \begin{cases}1 & u \geq 0\\0 & u < 0\end{cases} \)
\(w_{ji} = N(0,\sigma ^2) \)
\(u_{j} = \sum_i^M w_{ji} x_{i} \)
¤¬Ä´ÅÙÎɤ¤Ê¬»¶¤ò»ý¤Ä ¦Ò ¤Ï²¿¤«?\(\sigma_{u_{j}}^2 = \sum_i^M w_{ji}^2 \sigma_{x_{i}}^2 \)
¤¤Á¤ó¤È ¦Ò ¤¬Ä´À°¤µ¤ì¤Æ¤¤¤Æ¡¢x ¤Îʬ»¶¤¬ 1 ¤Ë¤Ê¤Ã¤Æ¤¤¤ë¤È¤¹¤ë¤È\(\sigma_{u_{j}}^2 = \sum_i^M w_{ji}^2 \)
\(w_{ji}\) ¤ÎÊ¿¶Ñ¤ò 0 ¤È¤¹¤ë¤È\(\sigma_{u_{j}}^2 = \sum_i^M ( w_{ji} - 0 )^2 = \sum_i^M ( w_{ji} - \overline{w_{ji}} )^2 \)
ξÊÕ¤ò M ¤Ç³ä¤ë¤È\(\frac{\sigma_{u_{j}}^2}{M} = \frac{ \sum_i^M ( w_{ji} - \overline{w_{ji}} )^2 }{M} = \sigma^2 \\ \therefore \sigma = \frac{\sigma_{u_{j}}}{\sqrt{M}} \)
\(\Phi_X = \frac{1}{N} X X^T \)
\(\Phi_U = \frac{1}{N} (PX) (PX)^T \)
\(I = (PX) (PX)^T = PXX^{T}P^{T} \\ \therefore P^{-1} (P^{T})^{-1} = XX^{T} = \Phi_X \\ \therefore \Phi_X^{-1} = P^{T} P \)
\(\Phi_{X} \overrightarrow{e_{i}} = \lambda_{i} \overrightarrow{e_{i}} \\ \Phi_{X} [\overrightarrow{e_{1}} \overrightarrow{e_{2}} \dots \overrightarrow{e_{n}}] = [\overrightarrow{e_{1}} \overrightarrow{e_{2}} \dots \overrightarrow{e_{n}}] \begin{bmatrix} \lambda_{1} & 0 & \dots & 0 \\ 0 & \lambda_{2} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \lambda_{n} \end{bmatrix} \\ \Phi_{X} E = E \Lambda \\ \\ \therefore \Phi_{X} = E \Lambda E^{-1} \\ \therefore \Phi_{X}^{-1} = E (E \Lambda)^{-1} = E \Lambda^{-1} E^{-1} = E \Lambda^{-1} E^{T} \\ \)
\(\Phi_{X}^{-1} = E \Lambda^{-1} E^{T} = E \Lambda^{-1/2} Q^{T} Q \Lambda^{-1/2} E^{T} = (Q \Lambda^{-1/2} E^{T})^{T} (Q \Lambda^{-1/2} E^{T}) = P^{T} P \\ \therefore P = Q \Lambda^{-1/2} E^{T} \)
¢¨ \(\Lambda^{-1/2}\) ¤Ï¡¢ÂгÑÀ®Ê¬¤¬ \(\sqrt{\lambda_{i}}\) ¤ÎÂгѹÔÎó\(P_{PCA} = \Lambda^{-1/2} E^{T} \)
ÊÑ´¹¹ÔÎó \(P_{PCA}\) ¤Ï¡¢¤Û¤Ü ¸ÇÍ¥Ù¥¯¥È¥ë¡£ÊÑ´¹¹ÔÎó¤Î°ÕÌ£¤¹¤ë¤È¤³¤í¤Ï¡¢¼çÀ®Ê¬Ê¬ÀϤÈƱ¤¸\(P_{ZCA} = E \Lambda^{-1/2} E^{T} \)
E ¤ò²óž¹ÔÎó¤È¸«¤ë¤È¡¢\(P_{ZCA}\) ¤Ë¤è¤ë¼ÌÁü¤Ç°ÌÁê¤ÏÊѤï¤é¤Ê¤¤ ¢ª ¥¼¥í°ÌÁêÀ®Ê¬Ê¬ÀÏ\(P_{PCA} = (\Lambda^{-1/2} + \epsilon I) E^{T} \\ P_{ZCA} = E (\Lambda^{-1/2} + \epsilon I) E^{T} \\ \epsilon = 10^{-6} \)
\(y = w_1 x_1 + w_2 x_2 + \dots + w_{100} x_{100} + \epsilon \)
y : ±þÅúÊÑ¿ô (½ÐÎÏ)\(\begin{bmatrix}y_1 \\ y_2 \\ \vdots \\ y_{100} \end{bmatrix} = \begin{bmatrix} x_{1~1} & x_{1~2} & \dots & x_{1~100} \\ x_{2~1} & x_{2~2} & \dots & x_{2~100} \\ \vdots & \vdots & \ddots & \vdots \\ x_{100~1} & x_{100~2} & \dots & x_{100~100} \\ \end{bmatrix} \begin{bmatrix}w_1 \\ w_2 \\ \vdots \\ w_{100} \end{bmatrix} + \epsilon \)
¤ò²ò¤¯É¬Íפ¬¤¢¤ë¡£¢Í ºÇÄã 100 ¸Ä´Ñ¬¥Ç¡¼¥¿ (X,Y) ¤¬É¬Íפǡ¢´Ñ¬¥Ç¡¼¥¿¿ô¤¬Â¿¤¯¤Ê¤ì¤Ð¤Ê¤ë¤Û¤É¥Î¥¤¥º ¦Å ¤ò¾®¤µ¤¯¤Ç¤¤ë\(y = w_1 x_1 + w_2 x_2 + \dots + w_{100} x_{100} + \epsilon \\ = w_1 x_1 + w_2 x_2 + w_3 x_3 + \epsilon \)
¤Ä¤Þ¤ê¡¢¤µ¤Ã¤ÍÑ°Õ¤·¤¿ 100¸Ä¤Î´Ñ¬¥Ç¡¼¥¿¤Ë¤Ä¤¤¤Æ\(\begin{bmatrix}y_1 \\ y_2 \\ \vdots \\ y_{100} \end{bmatrix} = \begin{bmatrix} x_{1~1} & x_{1~2} & x_{1~3} \\ x_{2~1} & x_{2~2} & x_{2~3} \\ \vdots & \vdots & \vdots \\ x_{100~1} & x_{100~2} & x_{100~3} \\ \end{bmatrix} \begin{bmatrix}w_1 \\ w_2 \\ w_3 \end{bmatrix} + \epsilon \)
¤È¤Ê¤ë¡£\(KL( \rho || \widehat{\rho}_j ) = \rho~log\frac{\rho}{\widehat{\rho}_j} + (1-\rho)~log\frac{(1-\rho)}{(1-\widehat{\rho}_j)} \)
\(KL(P||Q) = \int_{- \infty }^{ \infty } p(x)~log \frac{p(x)}{q(x)} ~dx \)
\(P(X=1) = p \\ P(X=0) = q = 1- p \\ f(k;p) = p^k (1-p)^{1-k}~~for~k \in \{0,1\} \)
\(p(x) = f(x;\rho) = \rho^x (1-\rho)^{1-x}~~for~x \in \{0,1\} \\ q(x) = f(x;\widehat{\rho}_j) = \widehat{\rho}_j^x (1-\widehat{\rho}_j)^{1-x}~~for~x \in \{0,1\} \\ \\ KL(P||Q) = \int_{- \infty }^{ \infty } p(x)~log \frac{p(x)}{q(x)} ~dx \\ = p(1)~log \frac{p(1)}{q(1)} + p(0)~log \frac{p(0)}{q(0)}~~~\because for~x \in \{0,1\}\\ = \rho~log\frac{\rho}{\widehat{\rho}_j} + (1-\rho)~log\frac{(1-\rho)}{(1-\widehat{\rho}_j)} // \)
\(\widetilde{E}(w) = E(w) + \beta \sum_{j=1}^{D_l} KL(\rho || \widehat{\rho_j}) \\ ~~\widehat{\rho_j} = \frac{1}{N} \sum_{n=1}^{N} z_j(x_n) \\ ~~KL(\rho || \widehat{\rho_j}) = \rho~log (\frac{\rho}{\widehat{\rho_j}}) + (1 - \rho)~log \frac{(1 - \rho)}{(1 - \widehat{\rho_j})} \)
\(\delta_{j}^{(l)} = \frac{\partial \widetilde{E}(w)}{\partial u_{j}^{(l)}} = \frac{\partial E(w)}{\partial u_{j}^{(l)}} + \frac{\partial}{\partial u_{j}^{(l)}} (\beta \sum_{j=1}^{D_l} KL(\rho || \widehat{\rho_j})) \\ = \frac{\partial E(w)}{\partial u_{j}^{(l)}} + \beta \frac{\partial (KL(\rho || \widehat{\rho_j}))}{\partial \widehat{\rho_j}} \frac{\partial \widehat{\rho_j}}{\partial u_{j}^{(l)}} \)
\(\frac{\partial E(w)}{\partial u_{j}^{(l)}} = \sum_k \delta_k^{(l+1)} w_{kj}^{(l+1)} f'(u_j^{(l)}) \)
\(\frac{\partial (KL(\rho || \widehat{\rho_j}))}{\partial \widehat{\rho_j}} \\ = \frac{\partial}{\partial \widehat{\rho_j}} (\rho~log (\frac{\rho}{\widehat{\rho_j}}) + (1 - \rho)~log \frac{(1 - \rho)}{(1 - \widehat{\rho_j})}) \\ = \frac{\partial}{\partial \widehat{\rho_j}} (\rho~log (\rho~\widehat{\rho_j}^{-1}) + (1 - \rho)~log ((1 - \rho)(1 - \widehat{\rho_j})^{-1})) \\ = \rho \frac{- \rho~\widehat{\rho_j}^{-2} }{ \rho~\widehat{\rho_j}^{-1} } + (1-\rho) \frac{- (1-\rho)~(1-\widehat{\rho_j})^{-2} }{ (1-\rho)~(1-\widehat{\rho_j})^{-1} } \bullet -1 \\ = \frac{- \rho}{ \widehat{\rho_j} } + \frac{ (1-\rho) }{ (1-\widehat{\rho_j}) } \)
cf.\((log(f(x)))' = \frac {f'(x)}{f(x)}\)\(\widehat{\rho_j} = \frac{1}{N} \sum_{n=1}^{N} z_j(x_n) = \frac{1}{N} \sum_{n=1}^{N} f(u_j^{(l)}) \\ \therefore \frac{\partial \widehat{\rho_j}}{\partial u_{j}^{(l)}} = f'(u_{j}^{(l)}) \)
\(\therefore \delta_{j}^{(l)} = \left\{ \sum_k \delta_k^{(l+1)} w_{kj}^{(l+1)} + \beta ( \frac{- \rho}{ \widehat{\rho_j} } + \frac{1-\rho}{1-\widehat{\rho_j}} ) \right\} f'(u_{j}^{(l)}) \)
\(\delta^{(l)} = f'(u^{(l)}) \odot (W^{(l+1)T} \delta^{(l+1)} + \beta \Theta) \)
\(\begin{bmatrix}\delta^{(l)}_0 \\ \delta^{(l)}_1 \\ \vdots \\ \delta^{(l)}_I \end{bmatrix} = \begin{bmatrix}+1 \\ f'(u^{(l)}_1) \\ \vdots \\ f'(u^{(l)}_I) \end{bmatrix} \odot \big( \begin{bmatrix}b^{(l+1)}_1 & \cdots & b^{(l+1)}_J \\ w^{(l+1)}_{11} & \cdots & w^{(l+1)}_{J1} \\ \vdots & \ddots & \vdots \\ w^{(l+1)}_{1I} & \cdots & w^{(l+1)}_{JI} \end{bmatrix} \begin{bmatrix}\delta^{(l+1)}_1 \\ \vdots \\ \delta^{(l+1)}_J \end{bmatrix} + \beta \begin{bmatrix}\theta^{(l+1)}_1 \\ \vdots \\ \theta^{(l+1)}_J \end{bmatrix} \big) \\ \theta^{(l+1)}_j = \frac{- \rho}{ \widehat{\rho_j} } + \frac{1-\rho}{1-\widehat{\rho_j}} \)
\(¢¨ C \equiv A \odot B ¢Î c_{ij} = a_{ij} b_{ij} \)
\(\widehat{\rho_j}^{(t)} = \lambda~\widehat{\rho_j}^{(t-1)} + (1-\lambda)\widehat{\rho_j} \)
\(u_{ij} = \sum_{p=0}^{H-1} \sum_{q=0}^{H-1} x_{i+p,j+q} h_{pq} \)
\(u_{ij} = \sum_{p=0}^{H-1} \sum_{q=0}^{H-1} x_{si+p,sj+q} h_{pq} \)
¢¨ ¸µ²èÁü¤¬µðÂç²á¤®¤ë¾ì¹ç¤ò½ü¤¤¤Æ¾ö¤ß¹þ¤ßÁؤǤϥ¹¥È¥é¥¤¥É¤Ï¤·¤Ê¤¤ (¥×¡¼¥ê¥ó¥°ÁؤǤÏÄ̾凉¥È¥é¥¤¥É¤¹¤ë)\(u_{ijm} = \sum_{k=0}^{K-1} \sum_{p=0}^{H-1} \sum_{q=0}^{H-1} z_{i+p,j+q,k}^{(l-1)} h_{pqkm} + b_{ijm} \\ z_{ijm} = f( u_{ijm} ) \)
\(u_{ijk} = \max_{(p,q) \in P_{ij}} z_{pqk} \)
\(u_{ijk} = \frac{1}{H^2} \sum_{(p,q) \in P_{ij}} z_{pqk} \)
\(u_{ijk} = (\frac{1}{H^2} \sum_{(p,q) \in P_{ij}} z_{pqk}^P)^\frac{1}{P} \)
¥³¥ó¥È¥é¥¹¥ÈÀµµ¬²½ | ÆþÎϲèÁü (1,2,...,N) ¤Î¥³¥ó¥È¥é¥¹¥È¤òÀµµ¬²½¤·¤Æ³Ø½¬¤·¤ä¤¹¤¯¤¹¤ë | |
¶É½ê¥³¥ó¥È¥é¥¹¥ÈÀµµ¬²½ | ²èÁüǧ¼±¤Î¥Í¥Ã¥È¥ï¡¼¥¯¤ËÁȤ߹þ¤à¡£¾ö¹þ¤ß¤ä¥×¡¼¥ê¥ó¥°¤Çµ±ÅÙ¤¬Ë°Ï¤¹¤ë¤Î¤òËɤ° | |
¸º»»Àµµ¬²½ | (i,j)¤Î²èÁÇÃͤò(i,j)¼þÊդνŤßÉդʿ¶Ñ¤Ë¤¹¤ë | |
½ü»»Àµµ¬²½ | (i,j)¤Î²èÁÇÃͤò(i,j)¼þÊդνŤßÉդʬ»¶¤Ç³ä¤ë¡£ÊѲ½¤Î¾¯¤Ê¤¤²Õ½ê¤Îǻø¤ò¶¯Ä´¤¹¤ë½èÍý¤ò¹Ô¤¦ |
\(\widetilde{x}_{ijk} = \Sigma_{n=1}^{N} x_{ijk}^{(n)} \\ x_{ijk} \leftarrow x_{ijk} - \widetilde{x}_{ijk} \)
(i,j) k¥Á¥ã¥ó¥Í¥ë ¤Î²èÁÇÃͤ«¤éÁ´¥Ç¡¼¥¿¤ÎÊ¿¶ÑÃͤò°ú¤¯\(\widetilde{x}_{ijk} = \Sigma_{(p,q) \in P_{ij}} w_{pq} x_{i+p,j+q} \\ x_{ij} \leftarrow x_{ij} - \widetilde{x}_{ij} \)
ŬÅö¤Ê½Å¤ß \(w_{pq}\) ¤Ç¼þÊդβèÁǤνŤßÉդʿ¶Ñ¤ò¤È¤Ã¤Æ¡¢²èÁÇÃÍ \(x_{ij}\) ¤«¤é¸º»»¤¹¤ë¡£\(\Sigma_{(p,q) \in P_{ij}} w_{pq} = 1 \\ w_{ij} \geq w_{pq}\ (for\ each\ (p,q) \in P_{ij}) \)
½Å¤ß¤Î¹ç·×¤Ï 1 ¤Ë¤Ê¤ë¤è¤¦¤Ë¤·¤Æ*1¡¢Ãæ¿´ (i,j) ¤Î½Å¤ß¤¬°ìÈÖÂ礤¯¤Ê¤ë¤è¤¦¤Ë¤¹¤ë*2¡£\(\overline{x}_{ij} = \frac{1}{H^2} \Sigma_{(p,q) \in P_{ij}} x_{i+p,i+q} \\ \sigma_{ij}^2 = \Sigma_{(p,q) \in P_{ij}} w_{pq} (x_{i+p,i+q} - \overline{x}_{ij})^2 \\ x_{ij} \leftarrow \frac{x_{ij} - \overline{x}_{ij}}{\sigma_{ij}} \)
(H ¤Ï Pij ¤Î½Ä²£Ä¹)\(x_{ij} \leftarrow \frac{x_{ij} - \overline{x}_{ij}}{max(c,\sigma_{ij})} \\ \)
\(x_{ij} \leftarrow \frac{x_{ij} - \overline{x}_{ij}}{\sqrt{c + \sigma_{ij}^2}} \\ \)
(c = 1.0 ¤Ê¤É)\(\Sigma_{(p,q) \in P_{ij}} w_{pq} = 1 \\ w_{ij} \geq w_{pq}\ (for\ each\ (p,q) \in P_{ij}) \)
½Å¤ß¤Î¹ç·×¤Ï 1 ¤Ë¤Ê¤ë¤è¤¦¤Ë¤·¤Æ*3¡¢Ãæ¿´ (i,j) ¤Î½Å¤ß¤¬°ìÈÖÂ礤¯¤Ê¤ë¤è¤¦¤Ë¤¹¤ë*4¡£\(\widetilde{x}_{ij} = \frac{1}{K} \Sigma_{k=0}^{K-1} \Sigma_{(p,q) \in P_{ij}} w_{pq} x_{i+p,j+q,k} \\ x_{ijk} \leftarrow x_{ijk} - \widetilde{x}_{ij} \)
\(\overline{x}_{ij} = \frac{1}{KH^2} \Sigma_{k=0}^{K-1} \Sigma_{(p,q) \in P_{ij}} x_{i+p,i+q,k} \\ \sigma_{ij}^2 = \frac{1}{K} \Sigma_{k=0}^{K-1} \Sigma_{(p,q) \in P_{ij}} w_{pq} (x_{i+p,i+q,k} - \overline{x}_{ij})^2 \\ x_{ijk} \leftarrow \frac{x_{ijk} - \overline{x}_{ij}}{\sigma_{ij}} \)
¥Î¥¤¥º¤¬¶¯Ä´¤µ¤ì¤Ê¤¤¤è¤¦¤ËʬÊì¤Ë¼ê¤ò²Ã¤¨¤Æ¤â¤è¤¤\(x_{ijk} \leftarrow \frac{x_{ijk} - \overline{x}_{ij}}{max(c,\sigma_{ij})} \\ \)
\(x_{ijk} \leftarrow \frac{x_{ijk} - \overline{x}_{ij}}{\sqrt{c + \sigma_{ij}^2}} \\ \)
(c = 1.0 ¤Ê¤É)\(u^{(l)} = W^{(l)} z^{(l-1)} + b^{(l)} \\ z^{(l)} = f^{(l)}(u^{(l)}) \\ \partial W = \delta^{(l)} z^{(l-1)} \)
\(W = Th \\ \partial W = \partial (Th) \\ \partial h = T \partial W \)