Logistic Regression

Sigmoid函数

input_array = np.array([1,2,3])
exp_array = np.exp(input_array)

# output
[1 2 3]
[ 2.71828183  7.3890561  20.08553692]
def sigmoid(z):
    """
    Compute the sigmoid of z

    Args:
        z (ndarray): A scalar, numpy array of any size.

    Returns:
        g (ndarray): sigmoid(z), with the same shape as z
         
    """

    g = 1/(1+np.exp(-z))
   
    return g
    
# Generate an array of evenly spaced values between -10 and 10
z_tmp = np.arange(-10,11)

# Use the function implemented above to get the sigmoid values
y = sigmoid(z_tmp)

# Code for pretty printing the two arrays next to each other
np.set_printoptions(precision=3) 
print("Input (z), Output (sigmoid(z))")
print(np.c_[z_tmp, y])

Decision Boundary - 决策边界

Logistic Loss - 逻辑损失

平方误差损失不适合逻辑损失

成本函数在线性回归中效果很好,自然也可以用于逻辑回归。不过,正如上面的幻灯片所指出的, 𝑓𝑤𝑏(𝑥)𝑓_{𝑤𝑏}(𝑥) 现在有一个非线性成分,即 sigmoid 函数:fw,b(x(i))=sigmoid(wx(i)+b)f_{w,b}(x^{(i)}) = sigmoid(wx^{(i)} + b ) 。让我们在之前的实验示例中尝试一下平方误差成本,现在包含了 sigmoid 函数

Logistic Loss Function - 对数损失函数 - 交叉熵损失函数

逻辑回归使用的损失函数更适合目标为 0 或 1 而不是任何数字的分类任务

loss(fw,b(x(i)),y(i))loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)})是单个数据点的成本,即

loss(fw,b(x(i)),y(i))={log(fw,b(x(i)))if y(i)=1log(1fw,b(x(i)))if y(i)=0\begin{equation} loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) = \begin{cases} - \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) & \text{if $y^{(i)}=1$}\\ - \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) & \text{if $y^{(i)}=0$} \end{cases} \end{equation}
  • fw,b(x(i))f_{\mathbf{w},b}(\mathbf{x}^{(i)})是模型的预测值,而 y(i)y^{(i)} 是目标值。

  • fw,b(x(i))=g(wx(i)+b)f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = g(\mathbf{w} \cdot\mathbf{x}^{(i)}+b)其中函数 𝑔 是 sigmoid 函数。

该损失函数的特点是使用两条不同的曲线。一条适用于目标值为 0 或 ( 𝑦=0𝑦=0 ) 的情况,另一条适用于目标值为 1 ( 𝑦=1𝑦=1 ) 的情况。这两条曲线结合在一起,提供了对损失函数有用的行为,即当预测值与目标值相匹配时为零,而当预测值与目标值相差较大时,其值迅速增加。

上述损失函数可以改写,以便更容易实现

loss(fw,b(x(i)),y(i))=(y(i)log(fw,b(x(i)))(1y(i))log(1fw,b(x(i)))loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) = (-y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right)

𝑦(𝑖)=0𝑦^{(𝑖)}=0 时,左边项被消除:

loss(fw,b(x(i)),0)=((0)log(fw,b(x(i)))(10)log(1fw,b(x(i)))=log(1fw,b(x(i)))\begin{align} loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), 0) &= (-(0) \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - 0\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) \\ &= -\log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) \end{align}

𝑦(𝑖)=1𝑦^{(𝑖)}=1 时,右边项被消除:

loss(fw,b(x(i)),1)=((1)log(fw,b(x(i)))(11)log(1fw,b(x(i)))=log(fw,b(x(i)))\begin{align} loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), 1) &= (-(1) \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - 1\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right)\\ &= -\log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) \end{align}

Cost Function for Logisic Regression

import numpy as np

X_train = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])  #(m,n)
y_train = np.array([0, 0, 0, 1, 1, 1]) 

对于逻辑回归,成本函数的形式是

J(w,b)=1mi=0m1[loss(fw,b(x(i)),y(i))]J(\mathbf{w},b) = \frac{1}{m} \sum_{i=0}^{m-1} \left[ loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) \right]

loss(fw,b(x(i)),y(i))loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)})是单个数据点的成本,即

loss(fw,b(x(i)),y(i))=y(i)log(fw,b(x(i)))(1y(i))log(1fw,b(x(i)))loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) = -y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right)

其中 m 是数据集中训练样本的数量

fw,b(x(i))=g(z(i))z(i)=wx(i)+bg(z(i))=11+ez(i)\begin{align} f_{\mathbf{w},b}(\mathbf{x^{(i)}}) &= g(z^{(i)}) \\ z^{(i)} &= \mathbf{w} \cdot \mathbf{x}^{(i)}+ b \\ g(z^{(i)}) &= \frac{1}{1+e^{-z^{(i)}}} \end{align}

compute_cost_logistic算法会循环计算所有示例,计算每个示例的损失并累加总数

请注意,变量 X 和 y 不是标量值,而是形状分别为 ( 𝑚,𝑛𝑚,𝑛) 和 ( 𝑚𝑚 ,) 的矩阵,其中 𝑛𝑛 是特征的数量, 𝑚𝑚 是训练实例的数量

def compute_cost_logistic(X, y, w, b):
    """
    Computes cost

    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      cost (scalar): cost
    """

    m = X.shape[0]
    cost = 0.0
    for i in range(m):
        z_i = np.dot(X[i],w) + b
        f_wb_i = sigmoid(z_i)
        cost +=  -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)
             
    cost = cost / m
    return cost
    
w_tmp = np.array([1,1])
b_tmp = -3
print(compute_cost_logistic(X_train, y_train, w_tmp, b_tmp))
output: 0.36686678640551745

Last updated

Was this helpful?