Regularized Logistic Regression

正则化逻辑回归

实施正则逻辑回归,以预测制造厂生产的微芯片是否通过质量保证 (QA)。在质量保证期间,每个微芯片都要经过各种测试,以确保其功能正常

mport numpy as np
import matplotlib.pyplot as plt
from utils import *
import copy
import math

Problem Statement

假设你是工厂的产品经理,你有一些微型芯片在两个不同测试中的测试结果

  • 通过这两项测试,可以确定是接受还是拒绝微型芯片

  • 为了帮助做出决定,有一个过去微型芯片测试结果的数据集,可以从中建立一个逻辑回归模型

Loading and visualizing the data - 加载数据并使其可视化

# load dataset
X_train, y_train = load_data("data/ex2data2.txt")

# print X_train
print("X_train:", X_train[:5])
print("Type of X_train:",type(X_train))

# print y_train
print("y_train:", y_train[:5])
print("Type of y_train:",type(y_train))
X_train: [[ 0.051267  0.69956 ]
 [-0.092742  0.68494 ]
 [-0.21371   0.69225 ]
 [-0.375     0.50219 ]
 [-0.51325   0.46564 ]]
Type of X_train: <class 'numpy.ndarray'>
y_train: [1. 1. 1. 1. 1.]
Type of y_train: <class 'numpy.ndarray'>

Check the dimensions of your variables

熟悉数据的另一个有用方法是查看其维度。打印X_trainy_train的形状,看看数据集中有多少训练示例

print ('The shape of X_train is: ' + str(X_train.shape))
print ('The shape of y_train is: ' + str(y_train.shape))
print ('We have m = %d training examples' % (len(y_train)))
The shape of X_train is: (118, 2)
The shape of y_train is: (118,)
We have m = 118 training examples

Visualize your data - 数据可视化

使用辅助函数plot_data(来自utils.py)生成类似图 3的图形,其中的坐标轴是两个测试分数,正例(y = 1,接受)和负例(y = 0,拒绝)用不同的标记表示

# Plot examples
plot_data(X_train, y_train[:], pos_label="Accepted", neg_label="Rejected")

# Set the y-axis label
plt.ylabel('Microchip Test 2') 
# Set the x-axis label
plt.xlabel('Microchip Test 1') 
plt.legend(loc="upper right")
plt.show()

从图 3 中可以看出,我们的数据集无法通过一条直线将其分为正例和负例。因此,直接应用逻辑回归在该数据集上不会有好的表现,因为逻辑回归只能找到线性决策边界。

Feature mapping - 特征映射

更好地拟合数据的方法之一是从每个数据点创建更多特征。在所提供的函数map_feature 中,我们将把特征映射到 𝑥1𝑥_1𝑥2𝑥_2 的所有多项式项,最大为六次幂。

map_feature(x)=[x1x2x12x1x2x22x13x1x25x26]\mathrm{map\_feature}(x) = \left[\begin{array}{c} x_1\\ x_2\\ x_1^2\\ x_1 x_2\\ x_2^2\\ x_1^3\\ \vdots\\ x_1 x_2^5\\ x_2^6\end{array}\right]

通过这种映射,两个特征向量(两个质量保证测试的分数)被转换成了一个 27 维的向量

  • 根据这一更高维度的特征向量训练的逻辑回归分类器将具有更复杂的决策边界,在我们的二维图中绘制时将是非线性的

print("Original shape of data:", X_train.shape)

mapped_X =  map_feature(X_train[:, 0], X_train[:, 1])
print("Shape after feature mapping:", mapped_X.shape)
Original shape of data: (118, 2)
Shape after feature mapping: (118, 27)

还将打印X_trainmapped_X的第一个元素,以查看转换情况

print("X_train[0]:", X_train[0])
print("mapped X_train[0]:", mapped_X[0])
X_train[0]: [0.051267 0.69956 ]
mapped X_train[0]: [5.12670000e-02 6.99560000e-01 2.62830529e-03 3.58643425e-02
 4.89384194e-01 1.34745327e-04 1.83865725e-03 2.50892595e-02
 3.42353606e-01 6.90798869e-06 9.42624411e-05 1.28625106e-03
 1.75514423e-02 2.39496889e-01 3.54151856e-07 4.83255257e-06
 6.59422333e-05 8.99809795e-04 1.22782870e-02 1.67542444e-01
 1.81563032e-08 2.47750473e-07 3.38066048e-06 4.61305487e-05
 6.29470940e-04 8.58939846e-03 1.17205992e-01]

虽然特征映射允许我们建立一个更具表现力的分类器,但它也更容易出现过拟合。

Cost function for regularized logistic regression - 正则逻辑回归的成本函数

对于正则化逻辑回归,成本函数的形式是

J(w,b)=1mi=0m1[y(i)log(fw,b(x(i)))(1y(i))log(1fw,b(x(i)))]+λ2mj=0n1wj2J(\mathbf{w},b) = \frac{1}{m} \sum_{i=0}^{m-1} \left[ -y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) \right] + \frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2

将其与不带正则化的成本函数(您在上文中已实现)进行比较,后者的形式为

J(w.b)=1mi=0m1[(y(i)log(fw,b(x(i)))(1y(i))log(1fw,b(x(i)))]J(\mathbf{w}.b) = \frac{1}{m}\sum_{i=0}^{m-1} \left[ (-y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right)\right]

不同之处在于正则化项,即

λ2mj=0n1wj2\frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2

请注意, 𝑏𝑏b 参数没有经过正则化处理

Exercise 1

下面的compute_cost_reg函数,为 𝑤𝑤 中的每个元素计算以下项

运行代码后将此值添加到未正则化的成本中(在compute_cost中计算的成本),以计算带有正则化的成本。

def compute_cost_reg(X, y, w, b, lambda_ = 1):

    m, n = X.shape
    
    # Calls the compute_cost function that you implemented above
    cost_without_reg = compute_cost(X, y, w, b) 
    
    # You need to calculate this value
    reg_cost = 0.
  
    for j in range(n):
        reg_cost_j = w[j]**2 
        reg_cost = reg_cost + reg_cost_j
    reg_cost = (lambda_/(2 * m)) * reg_cost
            
    # Add the regularization cost to get the total cost
    total_cost = cost_without_reg + reg_cost

    return total_cost

X_mapped = map_feature(X_train[:, 0], X_train[:, 1])
np.random.seed(1)
initial_w = np.random.rand(X_mapped.shape[1]) - 0.5
initial_b = 0.5
lambda_ = 0.5
cost = compute_cost_reg(X_mapped, y_train, initial_w, initial_b, lambda_)

print("Regularized cost :", cost)
Regularized cost : 0.6618252552483948

Gradient for regularized logistic regression - 正则化逻辑回归的梯度

则化成本函数的梯度有两个部分。第一个部分 J(w,b)b\frac{\partial J(\mathbf{w},b)}{\partial b} 是一个标量,另一个部分是一个与参数 𝐰𝐰 形状相同的向量,其中 𝑗th𝑗^{th} 元素的定义如下:

J(w,b)b=1mi=0m1(fw,b(x(i))y(i))\frac{\partial J(\mathbf{w},b)}{\partial b} = \frac{1}{m} \sum_{i=0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})
J(w,b)wj=(1mi=0m1(fw,b(x(i))y(i))xj(i))+λmwj\frac{\partial J(\mathbf{w},b)}{\partial w_j} = \left( \frac{1}{m} \sum_{i=0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) x_j^{(i)} \right) + \frac{\lambda}{m} w_j \quad\,

将其与没有正则化的成本函数梯度进行比较,后者的形式为

J(w,b)b=1mi=0m1(fw,b(x(i))y(i))(2)\frac{\partial J(\mathbf{w},b)}{\partial b} = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - \mathbf{y}^{(i)}) \tag{2}
J(w,b)wj=1mi=0m1(fw,b(x(i))y(i))xj(i)(3)\frac{\partial J(\mathbf{w},b)}{\partial w_j} = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - \mathbf{y}^{(i)})x_{j}^{(i)} \tag{3}

可以看出, J(w,b)b\frac{\partial J(\mathbf{w},b)}{\partial b} 是相同的,不同之处在于 J(w,b)w\frac{\partial J(\mathbf{w},b)}{\partial w} 中的以下项,即

λmwj\frac{\lambda}{m} w_j \quad\,

Exercise 2

def compute_gradient_reg(X, y, w, b, lambda_ = 1): 
    m, n = X.shape
    
    dj_db, dj_dw = compute_gradient(X, y, w, b)
   
    for j in range(n): 
        
        dj_dw_j_reg = (lambda_ / m) * w[j] 
        
        dj_dw[j] = dj_dw[j] + dj_dw_j_reg
             
    return dj_db, dj_dw

X_mapped = map_feature(X_train[:, 0], X_train[:, 1])
np.random.seed(1) 
initial_w  = np.random.rand(X_mapped.shape[1]) - 0.5 
initial_b = 0.5
 
lambda_ = 0.5
dj_db, dj_dw = compute_gradient_reg(X_mapped, y_train, initial_w, initial_b, lambda_)

print(f"dj_db: {dj_db}", )
print(f"First few elements of regularized dj_dw:\n {dj_dw[:4].tolist()}", )
dj_db: 0.07138288792343662
First few elements of regularized dj_dw:
 [-0.010386028450548701, 0.011409852883280124, 0.0536273463274574, 0.003140278267313462]

Learning parameters using gradient descent - 使用梯度下降法学习参数

与前几部分类似,将使用上面实现的梯度下降函数来学习最佳参数 𝑤𝑤 , 𝑏𝑏

  • 如果已经正确完成了正则化逻辑回归的成本和梯度计算,那么应该可以通过下一个单元格来学习参数 𝑤𝑤

  • 训练完参数后,将用它来绘制决策边界

下面的代码块运行起来需要相当长的时间,尤其是非矢量化版本。可以减少迭代次数来测试,并加快迭代速度。如果稍后有时间,运行 100,000 次迭代可以看到更好的结果。

# Initialize fitting parameters
np.random.seed(1)
initial_w = np.random.rand(X_mapped.shape[1])-0.5
initial_b = 1.

# Set regularization parameter lambda_ (you can try varying this)
lambda_ = 0.01    

# Some gradient descent settings
iterations = 10000
alpha = 0.01

w,b, J_history,_ = gradient_descent(X_mapped, y_train, initial_w, initial_b, 
                                    compute_cost_reg, compute_gradient_reg, 
                                    alpha, iterations, lambda_)

Expected Output: Cost < 0.5

Plotting the decision boundary - 绘制决策边界

为了直观地看到分类器学习到的模型,使用plot_decision_boundary函数来绘制分隔正负示例的(非线性)决策边界

  • 在该函数中,通过在均匀分布的网格上计算分类器的预测值来绘制非线性决策边界,然后绘制预测值从 y = 0 到 y = 1 变化的等高线图

  • 学习完参数 𝑤𝑤 , 𝑏𝑏 后,下一步就是绘制与图 4 类似的决策边界

plot_decision_boundary(w, b, X_mapped, y_train)
# Set the y-axis label
plt.ylabel('Microchip Test 2') 
# Set the x-axis label
plt.xlabel('Microchip Test 1') 
plt.legend(loc="upper right")
plt.show()

Evaluating regularized logistic regression model - 正则化逻辑回归模型的评估

#Compute accuracy on the training set
p = predict(X_mapped, w, b)

print('Train Accuracy: %f'%(np.mean(p == y_train) * 100))
Train Accuracy: 82.203390

Last updated

Was this helpful?