Loading [MathJax]/jax/output/HTML-CSS/jax.js

| Bottom | Home | Article | Bookshelf | Keyword | Author | Oxymoron |

deeplearningmath

Mathematics to understand Deep Learning

Category: ICT
Published: 2017
#1711b

Yoshiyuki & Sadami Wakui (涌井良幸・貞美)

up 17810
Title

Mathematics to understand Deep Learning

ディープラーニングがわかる数学

Index
  1. Introduction:
  2. How to express activity of neuron:
  3. How the neural network learns:
  4. Basic mathematics for neural network:
  5. Cost function of neural network:
  6. Back propagation method:
  7. Translation to neural network language:
  1. 序文:
  2. ニューロンの働きの表現:
  3. ニューラルネットワークはどう学ぶのか:
  4. ニューラルネットワークのための基本数学:
  5. ニューラルネットワークのコスト関数:
  6. 誤差逆伝搬法 (BP法):
  7. ニューラルネットワークの言葉に翻訳:
Tag
; Axon; Back propagation method; Chain rule; Convolution layer; Cost function; Data dependent; Dendrite; Demon-Subordinate Network; Displacement vector; Gradient descent; Lagrange multiplier method; Learning data; Minimization problem; Neural network; Neurotransmission; Recurrence formula; Regression analysis; Sigmoid function; Similarity of pattern; Square error; Stress tensor; Synapse;
Résumé
Remarks

>Top 0. Introduction:

  • Activity of Neuron:

0. 序文:

  • ニューロンの働き

>Top 1. How to express activity of neuron:

  • Neuron is composed of:
    • Cyton: cell body (=neuron)
    • Dendrites: Input of information
    • Axon: Output of information
      input_threshold
    • >Top Unit step function vs. Sigmoid function (differentiable):
      • Unite step function: Output is 0 or 1.
      • Sigmoid function (σ(z)=11+ez):
        Output is arbitrary number.
    • step_sigmoid
    • Activation function:
      y=a(w1x1+w2x2+w3x3θ)(θ:threshold)
  • >Top Neural Network: (>Fig.)
    • Input layer - Middle layer - Output layer:
      • Input layer: no input arrows; output only.
      • Middle (Hidden) layer: actually process the information
        • this layer reflects intention of the planner.
        • 'hidden demons' in the middle layer have particular properties, which are different sensitivity of a specific pattern.
      • Output layer: output as the total of neural network.
    • Deep Learning: neural network having many deep layers.
    • Fully connected layer: All units of the previous layer point to the next layer.
    • >Top Compatibility of Demons and their subordinates: (>Fig)
      1. Pixel 5 & 8 ON
      2. Subordinate 5 & 8 excited
      3. Hidden Demon-B excited
      4. Output Demon-1 excited
      5. The picture was judged to be "1".
    • Thus, compatibility or bias of each Demon leads the answer; The network decides as a whole.
    • >Top AI Development phases:
Gen. Period Key App.
1G 1950s-60s Logic dependent Puzzle
2G 1980s Knowledge dependent Robot; Machine translation
3G 2010- Data dependent Pattern recog; Speech recog.


1. ニューロンの働きの表現:

  • Neurotransmission (神経伝達):

  • Input zは、以下2ベクトルの内積:
    z=(w1,w2,w3,b)(x1,x2,x3,1)
  • Neural Network:

neuralnetwork

  • Demons and their subordinates:
  • Subordinates reflect vividly 4 & 7, and 6 & 9.
  • 1 2 3
    4 5 6
    7 8 9
    10 11 12
  • Demon-Subordinate Network:

demon_subordinate

>Top 2. How the neural network learns:

  • >Top
    Regression Analysis:
    Learning with teacher or without teacher: (>Fig.)
    • With teacher: Learning data (or Supervised data)
    • To minimize the errors between estimate and correct answer;
      'Least-square method'; 'Regression Analysis'
      • Total of errors: cost function CT
      • Weight parameter can take negative value, unlike the case of biology.

2. ニューラルネットワークはどう学ぶのか:

  • Regression Analysis (回帰分析):

regressionanalysis

>Top 3. Basic mathematics for neural network:

  • Inner products: ab=|a||b|cosθ
  • Cauthy-Schwarz inequality:
    • |a||b|ab|a||b|
    • |a||b||a||b|cosθ|a||b|
  • Similarity of pattern:
    • A=(x11x12x13x21x22x23x31x32x33)
    • F=(w11w12w13w21w22w23w31w32w33)
    • Similarity=AF=w11x11+w12x12+...+w33x33
    • Similarity is proportional to the Inner product of A and F.
  • Stress tensor: (>Fig.)
    • Stress tensor (T)=(τ11τ12τ13τ21τ22τ23τ31τ32τ33)
    • Google: 'Tensor-Flow'
  • Matrix product:
    • AB=(c11c12c1pc21c22c2pcn1cn2cnp)=cij=mk=1aikbkj
  • Hadamard product:
    • AB=(aijbij)1im1jn
  • Transposed matrix:
    • tBtA (>¶)
  • >Top Differential: (Composite function = Chain rule)
    • dydx=dydududx
    • zx=zuux+zvvx
    • zy=zuuy+zvvy
    • (ex)=ex
      • y=eu,u=x: then, y=dydududx=du(1)=ex
      • 1f(x)=f(x){f(x)}2
      • σ(x)=11+ex (Sigmoid function)
        • σ(x)=σ(x)(1σ(x)) (>¶)
  • Multivariable function: Partial differential (derivative):
    • zx=f(x,y)x
      • limΔx0f(x+Δx,y)f(x,y)Δx
    • zy=f(x,y)y
      • limΔy0f(x,y+Δy)f(x,y)Δy
  • >Top Lagrange multiplier method: Finding local maxima and minima of a function subject to equality constraints.
    • Maximize f(x,y) subject to g(x,y)=0
    • F(x,y,λ)=f(x,y)λ(g(x,y)c)
      • Fx=Fy=Fλ=0
  • Approximate formula:
    • f(x+Δx)f(x)+f(x)Δx
    • f(x+Δx,y+Δy)f(x,y)+f(x,y)xΔx+f(x,y)yΔy
    • ΔzzxΔx+zyΔy
    • ΔzzwΔw+zxΔx+zyΔy
    • z=(zw,zx,zy),
      Δx=(Δw,Δx,Δy)
  • >Top Gradient descent: (>Fig.)
    • Δz=f(x,y)xΔx+f(x,y)yΔy
    • When Δz becomes minimum: two vectors are in the opposite direction.
      • (Δx,Δy)=η(f(x,y)x,f(x,y)y)
    • Δx=(Δx1,Δx2,...,Δxn)=ηf,
      where Δx is a displacement vector; η is a small positive number;
      • f=(fx1,fx2,...,fxn)

3. ニューラルネットワークのための基本数学:

  • Stress tensor:
  • stress_tensor
  • 転置行列:
  • A=(a11a1nam1amn)
  • B=(b11b1pbn1anp)
  • (i,j) element of AB(=(j,i) element of
    t(AB)):
    • (ai1ai2a1n)(b1jb2jbnj)
      =nk=1aikbkj
  • (j,i) element of tBtA:
    • (bj1bj2bnj)(ai1bi2bin)
      =nk=1bkjaik=nk=1aikbkj
    • t(AB)= tBtA

  • σ(x)=1+ex(1+ex)2=ex(1+ex)2=1+ex1(1+ex)2
    =11+ex1(1+ex)2σ(x)σ(x)2=

  • Lagrange Multiplier (ラグランジュ未定乗数法):
    Blue lines are counturs of f(x,y)
    Red line shows the constraint g(x,y)=c

lagrangemultiplier

  • 勾配降下法 (Gradient descent method)
  • Displacement vector (変位ベクトル)
  • z=x2+y2;Δx=ηz=(2x,2y)

gradient_descent


>Top 4. Cost function of neural network:

  • Gradient Descent (Sample): (>Fig.)
    • <Middle layer>
    • (z21.z22.z23)=(w211w212w213w2112.w221w222w223w2212.w231w232w233w2312)(x1.x2.x3..x12)+(b21.b22.b23)
    • a2i=a(z2i)(i=1,2,3)
    • <Output layer>
    • (z31.z32)=(w311w312w313.w321w322w323)(a21.a22.a23)+(b31.b32)
    • a3i=a(z3i)(i=1,2)
    • <C=Square error>
    • C=12{(t1a31)2+(t2a32)2}
    • <CT=Cost function>
    • CT=64kCk
    • Ck=64kCk12{(t1[k]a31[k])2+(t2[k]a32[k])2}
  • Applying Displacement Descent:
    • Δx=(Δx1,Δx2,...,Δxn)=ηf (where f is Gradient)
    • (Δw211,,Δw311,,Δb21,,Δb31,)
    • =η(CTw211,,CTw311,,CTb21,,CTb31,)

4. ニューラルネットワークのコスト関数:

  • 勾配降下法 (例):

neuralnetwork_1

>Top 5. Back propagation method:

  • Square error; Minimization problem of Cost function:
  • C=12{(t1a31)2+(t2132)2}
  • δlj=Czlj(l=2,3,)
  • Cw211=\frac{\partial C}{\partial z_1}^2}
    {\partial z_1^2}{\partial w_{11}^2}
    • z21=w211x1+w212x2++w2112x12+b21
    • Cw211=δ21x1=δ21a11
  • >Top <General formula>: from partial differentail to recurrence formula.
  • Cwlji=δljal1i
  • Cblj=δlj
  • Forward & Back Propagation:
  • <Forward Propagation>
    1. Read the data.
    2. Set up the default data.
    3. Calculate C.
    4. Square error C.
  • <Back Propagation>
    1. Calculate δ by Back propagation method.
    2. Calculate Cost function CT and its gradient CT.
    3. Update Weight W ana bias b by Gradient descent method.
    4. Return to 3.
  • <Matrix representation>
    • (δ31δ32)=(Ca31Ca32)(a(z31)a(z32))
    • (δ21δ22δ23)=[(w311w321w312w322w313w323)(δ31δ32)](a(z21)a(z22)a(z23)

5. 誤差逆伝搬法 (BP法):

  • C: 二乗誤差
  • δlj: Unitの誤差の定義
  • <コスト関数CT最小化問題>
    1. 最小条件の方程式
      CTx=0,CTy=0,CTz=0
    2. 勾配降下法:
      勾配(CTx,CTy,CTz)
    3. 誤差逆伝搬法:
      Solved partial differential value by
      recurrence formula.
  • Forward propagation & Back propagation:

forward_back_propagation

>Top 6. Translation to neural network language:

  • Favorite pattern of a demon:
  • degreeofsimilarity
  • Gradient of Cost function CT
  • >Top Convolution layers:
    =(CTwF111,,CTw01111,,CTbF1,,CTb01,,)
    • 1st term: Partial differential of filter.
    • 2nd term: Partial differential of unit weight of output layer.
    • 3rd term: Partial differential of unit weight of 'convolution' layers.
    • 4th term: Partial differential of unit weight of output layer.

6. ニューラルネットワークの言葉に翻訳:

  • Feature Map by convolution of Filter-S:
  • 2 1 0 1
    0 0 1 2
    0 0 3 0
    0 3 1 1

 

  • Convolution layers (畳み込み層):
  • Picture >Similarity >Convolution (Weight) >Convolution (Output) >Pooling:
  • figure_similarity
Comment
  • Mathematics for deep learning relates mostly partial differential. Recurrence formula are easier for computer to calculate than partial differentials, and are used instead.
  • It is interesting to understand how AI understand analog picture to digitized recognition, by doing complicated mathematical calculations. It is as expected that computing ability in quick calculation is a decisive factor.
  • Computer itself is no more smart but is only speedy in calculation.
  • ディープラーニングの数学は、特に偏微分に関連する。コンピュータにとっては、偏微分より漸化式の法が得意なのでよく代用される。
  • AIが複雑な数学計算によって、どのようにアナログな図をデジタル認識するのかは興味深い。それにしても予想通り、コンピュータの計算力の早さが決め手である。
  • コンピュータ自体がスマートという訳ではなく、単に計算が速いだけなのである。

| Top | Home | Article | Bookshelf | Keyword | Author | Oxymoron |