Logistic Gradient Descent

逻辑回归梯度下降

Logistic Gradient Descent

回忆梯度下降算法利用梯度计算:

repeat until convergence:  {      wj=wjαJ(w,b)wj  for j := 0..n-1          b=bαJ(w,b)b}\begin{align*} &\text{repeat until convergence:} \; \lbrace \\ & \; \; \;w_j = w_j - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{1} \; & \text{for j := 0..n-1} \\ & \; \; \; \; \;b = b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \\ &\rbrace \end{align*}

其中,每次迭代都会对所有 𝑤𝑗𝑤_𝑗 执行同步更新 𝑗𝑗

J(w,b)wj=1mi=0m1(fw,b(x(i))y(i))xj(i)J(w,b)b=1mi=0m1(fw,b(x(i))y(i))\begin{align*} \frac{\partial J(\mathbf{w},b)}{\partial w_j} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{2} \\ \frac{\partial J(\mathbf{w},b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{3} \end{align*}

逻辑回归模型

z=wx+bz = \mathbf{w} \cdot \mathbf{x} + b
fw,b(x)=g(z)f_{\mathbf{w},b}(x) = g(z)

其中 𝑔(𝑧)𝑔(𝑧) 是sigmoid函数

g(z)=11+ezg(z) = \frac{1}{1+e^{-z}}

Gradient Descent Implementation

梯度下降算法的实现由两个部分组成:

  • 实现上述公式 (1) 的循环。这就是下面的梯度下降

  • 计算当前梯度,即上文公式 (2,3)。这就是下面的compute_gradient_logistic

对所有 𝑤𝑗𝑤_𝑗𝑏𝑏 执行上述公式 (2)、(3) 。实现的方法有很多种。下面介绍一个:

  • 初始化变量,以累积dj_dwdj_db

  • 每个例子

    • 计算该示例的误差 g(wx(i)+b)y(i)g(\mathbf{w} \cdot \mathbf{x}^{(i)} + b) - \mathbf{y}^{(i)}

    • 本例中的每个输入值 xj(i)x_{j}^{(i)}

      • 将误差乘以输入的 xj(i)x_{j}^{(i)} 并将它添加到dj_dw的相应元素中。(上述方程式2)

    • 将错误添加到dj_db(上述方程式 3)

  • 用示例总数 (m) 除以dj_dbdj_dw

  • 注意,NumPy 中的 𝐱(𝑖)𝐱^{(𝑖)} 可以表示为X[i,:]或X[i,j],而 𝑥𝑗(𝑖)𝑥^{(𝑖)}_𝑗 表示的是具体的元素

import copy, math
import numpy as np
import matplotlib.pyplot as plt

X_train = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y_train = np.array([0, 0, 0, 1, 1, 1])

def compute_gradient_logistic(X, y, w, b): 
    """
    Computes the gradient for logistic regression 
 
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters  
      b (scalar)      : model parameter
    Returns
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar)      : The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape
    dj_dw = np.zeros((n,))                           #(n,)
    dj_db = 0.

    for i in range(m):
        f_wb_i = sigmoid(np.dot(X[i],w) + b)          #(n,)(n,)=scalar
        err_i  = f_wb_i  - y[i]                       #scalar
        for j in range(n):
            dj_dw[j] = dj_dw[j] + err_i * X[i,j]      #scalar
        dj_db = dj_db + err_i
    dj_dw = dj_dw/m                                   #(n,)
    dj_db = dj_db/m                                   #scalar
        
    return dj_db, dj_dw  
    
X_tmp = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y_tmp = np.array([0, 0, 0, 1, 1, 1])
w_tmp = np.array([2.,3.])
b_tmp = 1.
dj_db_tmp, dj_dw_tmp = compute_gradient_logistic(X_tmp, y_tmp, w_tmp, b_tmp)
print(f"dj_db: {dj_db_tmp}" )
print(f"dj_dw: {dj_dw_tmp.tolist()}" )
output:
dj_db: 0.49861806546328574
dj_dw: [0.498333393278696, 0.49883942983996693]

上述方程(1)的实现代码如下。定位并比较程序中的函数与上面的方程。

def gradient_descent(X, y, w_in, b_in, alpha, num_iters): 
    """
    Performs batch gradient descent
    
    Args:
      X (ndarray (m,n)   : Data, m examples with n features
      y (ndarray (m,))   : target values
      w_in (ndarray (n,)): Initial values of model parameters  
      b_in (scalar)      : Initial values of model parameter
      alpha (float)      : Learning rate
      num_iters (scalar) : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,))   : Updated values of parameters
      b (scalar)         : Updated value of parameter 
    """
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  # avoid modifying global w within function
    b = b_in
    
    for i in range(num_iters):
        # Calculate the gradient and update the parameters
        dj_db, dj_dw = compute_gradient_logistic(X, y, w, b)   

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               
        b = b - alpha * dj_db               
      
        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( compute_cost_logistic(X, y, w, b) )

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]}   ")
        
    return w, b, J_history         #return final w,b and J history for graphing

w_tmp  = np.zeros_like(X_train[0])
b_tmp  = 0.
alph = 0.1
iters = 10000

w_out, b_out, _ = gradient_descent(X_train, y_train, w_tmp, b_tmp, alph, iters) 
print(f"\nupdated parameters: w:{w_out}, b:{b_out}")
Iteration    0: Cost 0.684610468560574   
Iteration 1000: Cost 0.1590977666870456   
Iteration 2000: Cost 0.08460064176930081   
Iteration 3000: Cost 0.05705327279402531   
Iteration 4000: Cost 0.042907594216820076   
Iteration 5000: Cost 0.034338477298845684   
Iteration 6000: Cost 0.028603798022120097   
Iteration 7000: Cost 0.024501569608793   
Iteration 8000: Cost 0.02142370332569295   
Iteration 9000: Cost 0.019030137124109114   

updated parameters: w:[5.28 5.08], b:-14.222409982019837

Logistic Regression - Scikit-Learn

import numpy as np

X = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y = np.array([0, 0, 0, 1, 1, 1])

Fit the model - 拟合模型

from sklearn.linear_model import LogisticRegression

lr_model = LogisticRegression()
lr_model.fit(X, y)

y_pred = lr_model.predict(X)

print("Prediction on training set:", y_pred)
Prediction on training set: [0 0 0 1 1 1]

Calculate accuracy

print("Accuracy on training set:", lr_model.score(X, y))
Accuracy on training set: 1.0

Last updated

Was this helpful?