SGD = Stochastic Gradient Descent

8월 23, 2017

신경망 학습에서 '확률적으로 무작위로 골라낸 데이터'에 대해 수행하는 경사 하강법.
신경망 학습에서는 데이터를 미니배치로 무작위로 선정해서 학습하므로, SGD라 지칭함.

경사법 = Gradient Method

경사 하강법 = Gradient Descent Method

파이썬 구현

def gradient_descent(f, init_x, lr=0.01, step_num=100):
    x = init_x
    x_history = [] # 학습에 따른 변화 정도를 보기위해 값 저장

    for i in range(step_num):
        x_history.append( x.copy() )

        grad = numerical_gradient(f, x)
        x -= lr * grad

    return x, np.array(x_history)

def function_2(x):
    return x[0]**2 + x[1]**2

init_x = np.array([-3.0, 4.0])    

lr = 0.1
step_num = 100
x, x_history = gradient_descent(function_2, init_x, lr=lr, step_num=step_num)

plt.plot( [-5, 5], [0,0], '--b')
plt.plot( [0,0], [-5, 5], '--b')
plt.plot(x_history[:,0], x_history[:,1], 'o')

plt.xlim(-3.5, 3.5)
plt.ylim(-4.5, 4.5)
plt.xlabel("X0")
plt.ylabel("X1")
plt.show()

# lr = 0.9
x_history
array([[ -3.00000000e+00,   4.00000000e+00],
       [  2.40000000e+00,  -3.20000000e+00],
       [ -1.92000000e+00,   2.56000000e+00],
       ..., 
       [  1.19357577e-09,  -1.59143436e-09],
       [ -9.54860614e-10,   1.27314749e-09],
       [  7.63888491e-10,  -1.01851799e-09]])

# lr = 1
x_history
array([[-3.,  4.],
       [ 3., -4.],
       [-3.,  4.],
       ..., 
       [ 3., -4.],
       [-3.,  4.],
       [ 3., -4.]])

SGD의 단점

Momentum

AdaGrad

Adam

이 블로그 검색

태그

Bakbang's Moments

SGD = Stochastic Gradient Descent

댓글

댓글 쓰기

이 블로그의 인기 게시물

Backpropagation

RMSprop