Logistic Regression

建立一个逻辑回归模型来预测学生是否被大学录取

mport numpy as np
import matplotlib.pyplot as plt
from utils import *
import copy
import math

Problem Statement

假设你是大学某系的管理者，你想根据两次考试的成绩来确定每个申请者的录取机会

您有以前申请人的历史数据，可以将其作为逻辑回归的训练集
在每个训练示例中，都有申请人的两次考试成绩和录取决定
建立一个分类模型，根据这两次考试的分数来估计申请人被录取的概率

Loading and visualizing the data - 加载数据并使其可视化

加载数据集

下图所示的load_dataset()函数将数据加载到变量X_train和y_train中

X_train包含一名学生的两次考试成绩
y_train是录取决定
- 如果学生被录取，则y_train = 1
- 如果学生未被录取，则y_train = 0
X_train和y_train都是 numpy 数组

View the variables

进一步熟悉数据集

一个好的开始是打印出每个变量，看看它包含什么内容

print("First five elements in X_train are:\n", X_train[:5])
print("Type of X_train:",type(X_train))

First five elements in X_train are:
 [[34.62365962 78.02469282]
 [30.28671077 43.89499752]
 [35.84740877 72.90219803]
 [60.18259939 86.3085521 ]
 [79.03273605 75.34437644]]
Type of X_train: <class 'numpy.ndarray'>

现在打印y_train的前五个值

print("First five elements in y_train are:\n", y_train[:5])
print("Type of y_train:",type(y_train))

First five elements in y_train are:
 [0. 0. 0. 1. 1.]
Type of y_train: <class 'numpy.ndarray'>

Check the dimensions of your variables - 检查变量的尺寸

熟悉数据的另一个有用方法是查看其维度。打印X_train和y_train的形状，看看数据集中有多少训练示例

print ('The shape of X_train is: ' + str(X_train.shape))
print ('The shape of y_train is: ' + str(y_train.shape))
print ('We have m = %d training examples' % (len(y_train)))

The shape of X_train is: (100, 2)
The shape of y_train is: (100,)
We have m = 100 training examples

Visualize your data - 数据可视化

在开始实施任何学习算法之前，如果可能的话，最好先将数据可视化

下面的代码将数据显示在二维图上（如下图所示），其中坐标轴是两个考试分数，正例和负例用不同的标记表示
我们使用utils.py文件中的一个辅助函数来生成该曲线图

# Plot examples
plot_data(X_train, y_train[:], pos_label="Admitted", neg_label="Not admitted")

# Set the y-axis label
plt.ylabel('Exam 2 score') 
# Set the x-axis label
plt.xlabel('Exam 1 score') 
plt.legend(loc="upper right")
plt.show()

目标是建立一个逻辑回归模型来拟合这些数据

有了这个模型，就可以根据新生的两次考试成绩来预测他们是否会被录取

Sigmoid function

逻辑回归的模型表示为:

f_{\mathbf{w},b}(x) = g(\mathbf{w}\cdot \mathbf{x} + b)

其中，函数 $𝑔$ 是 sigmoid 函数 sigmoid function 定义如下:

g(z) = \frac{1}{1+e^{-z}}

Exercise 1

sigmoid函数计算

def sigmoid(z):
    g = 1/(1+np.exp(-z))  
    return g
 
value = 0

print (f"sigmoid({value}) = {sigmoid(value)}")

sigmoid(0) = 0.5

代码还应该能处理向量和矩阵。对于矩阵，您的函数应该对每个元素执行 sigmoid 函数

print ("sigmoid([ -1, 0, 1, 2]) = " + str(sigmoid(np.array([-1, 0, 1, 2]))))

# UNIT TESTS
from public_tests import *
sigmoid_test(sigmoid)

sigmoid([ -1, 0, 1, 2]) = [0.26894142 0.5 0.73105858 0.88079708]

Cost function for logistic regression - 逻辑回归的成本函数

Exercise 2

计算成本函数

对于逻辑回归，成本函数的形式是

J(\mathbf{w},b) = \frac{1}{m}\sum_{i=0}^{m-1} \left[ loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) \right] \tag{1}

其中

m 是数据集中训练实例的数量
$loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)})$ 是单个数据点的成本，即

loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) = (-y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) \tag{2}

$𝑓_{𝐰,𝑏}(𝐱^{(𝑖)})$ 是模型的预测值而是实际标签 $𝑦^{(𝑖)}$
$f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = g(\mathbf{w} \cdot \mathbf{x^{(i)}} + b)$ 其中，函数 g 是 sigmoid 函数。
- 先计算一个中间变量 $z_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x^{(i)}} + b = w_0x^{(i)}_0 + ... + w_{n-1}x^{(i)}_{n-1} + b$ （其中 $𝑛$ 是特征的数量），然后再计算 $f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = g(z_{\mathbf{w},b}(\mathbf{x}^{(i)}))$

在此过程中，请记住变量X_train和y_train不是标量值，而是形状分别为 ( $𝑚,𝑛$ ) 和 ( $𝑚$ ,1) 的矩阵，其中 $𝑛$ 是特征的数量， $𝑚$ 是训练示例的数量

这部分可以使用上面实现的 sigmoid 函数

def compute_cost(X, y, w, b, *argv):

    m, n = X.shape
    
    cost = 0.0
    for i in range(m):
        z_i = np.dot(X[i],w) + b
        f_wb_i = sigmoid(z_i)
        cost +=  -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)
             
    total_cost = cost / m    
        
    return total_cost
    
m, n = X_train.shape

# Compute and display cost with w and b initialized to zeros
initial_w = np.zeros(n)
initial_b = 0.
cost = compute_cost(X_train, y_train, initial_w, initial_b)
print('Cost at initial w and b (zeros): {:.3f}'.format(cost))

# Compute and display cost with non-zero w and b
test_w = np.array([0.2, 0.2])
test_b = -24.
cost = compute_cost(X_train, y_train, test_w, test_b)

print('Cost at test w and b (non-zeros): {:.3f}'.format(cost))

Cost at initial w and b (zeros): 0.693

Cost at test w and b (non-zeros): 0.218

Gradient for logistic regression - 逻辑回归梯度

实现逻辑回归的梯度

梯度下降算法:

\begin{align*}& \text{repeat until convergence:} \; \lbrace \newline \; & b := b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \newline \; & w_j := w_j - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{1} \; & \text{for j := 0..n-1}\newline & \rbrace\end{align*}

其中，参数 $𝑏$ , $𝑤_𝑗$ 都是同时更新的

Exercise 3

compute_gradient函数，根据下面的公式 (2) 和 (3) 计算

\frac{\partial J(\mathbf{w},b)}{\partial b} = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - \mathbf{y}^{(i)}) \tag{2}

\frac{\partial J(\mathbf{w},b)}{\partial w_j} = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - \mathbf{y}^{(i)})x_{j}^{(i)} \tag{3}

m 是数据集中训练实例的数量
$𝑓_{𝐰,𝑏}(𝑥^{(𝑖)})$ 是模型的预测值，而是实际标签 $𝑦^{(𝑖)}$
Note：虽然这个梯度看起来与线性回归梯度相同，但公式实际上是不同的，因为线性回归和逻辑回归对 $𝑓_{𝐰,𝑏}(𝑥)$ 的定义不同

def compute_gradient(X, y, w, b, *argv): 
    m, n = X.shape
    dj_dw = np.zeros(w.shape)
    dj_db = 0.

    for i in range(m):
        # 计算线性部分 z_wb
        z_wb = 0
        for j in range(n): 
            z_wb += X[i, j] * w[j]
        z_wb += b
        
        # 计算逻辑回归的预测值 f_wb
        f_wb = 1 / (1 + np.exp(-z_wb))
        
        # 计算梯度的偏导数部分
        dj_db_i = f_wb - y[i]
        dj_db += dj_db_i
        
        for j in range(n):
            dj_dw[j] += (f_wb - y[i]) * X[i, j]
    
    # 平均化梯度
    dj_dw = dj_dw / m
    dj_db = dj_db / m
            
    return dj_db, dj_dw
    
# 计算并显示梯度，w 和 b 初始化为零
initial_w = np.zeros(n)
initial_b = 0.

dj_db, dj_dw = compute_gradient(X_train, y_train, initial_w, initial_b)
print(f'dj_db at initial w and b (zeros):{dj_db}' )
print(f'dj_dw at initial w and b (zeros):{dj_dw.tolist()}' )

# 计算并显示非零w和b的成本和梯度
test_w = np.array([ 0.2, -0.5])
test_b = -24
dj_db, dj_dw  = compute_gradient(X_train, y_train, test_w, test_b)

print('dj_db at test w and b:', dj_db)
print('dj_dw at test w and b:', dj_dw.tolist())

dj_db at initial w and b (zeros):-0.1
dj_dw at initial w and b (zeros):[-12.00921658929115, -11.262842205513591]

dj_db at test w and b: -0.5999999999991071
dj_dw at test w and b: [-44.831353617873795, -44.37384124953978]

Learning parameters using gradient descent - 使用梯度下降法学习参数

使用梯度下降法找到逻辑回归模型的最佳参数
验证梯度下降是否正常工作的一个好方法是查看 $𝐽(𝐰,𝑏)$ 的值，并检查它是否每一步都在减小
假定你已经正确地实现了梯度并计算了成本，那么 $𝐽(𝐰,𝑏)$ 的值应该永远不会增加，并在算法结束时收敛到一个稳定的值

def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters, lambda_): 

    # 训练实例数
    m = len(X)
    
    # 一个数组用于存储每次迭代的成本J和w的值，主要用于后续绘图
    J_history = []
    w_history = []
    
    for i in range(num_iters):

        # 计算梯度并更新参数
        dj_db, dj_dw = gradient_function(X, y, w_in, b_in, lambda_)   

        # 使用 w、b、alpha 和梯度更新参数
        w_in = w_in - alpha * dj_dw               
        b_in = b_in - alpha * dj_db              
       
        # 每次迭代时节省成本 J
        if i<100000:      # 防止资源枯竭
            cost =  cost_function(X, y, w_in, b_in, lambda_)
            J_history.append(cost)

        # 每隔 10 次打印一次成本，如果小于 10 次，则按迭代次数打印
        if i% math.ceil(num_iters/10) == 0 or i == (num_iters-1):
            w_history.append(w_in)
            print(f"Iteration {i:4}: Cost {float(J_history[-1]):8.2f}")
        
    return w_in, b_in, J_history, w_history 

np.random.seed(1)
initial_w = 0.01 * (np.random.rand(2) - 0.5)
initial_b = -8

# 梯度下降的一些设置
iterations = 10000
alpha = 0.001

w,b, J_history,_ = gradient_descent(X_train ,y_train, initial_w, initial_b, 
                                   compute_cost, compute_gradient, alpha, iterations, 0)

Iteration    0: Cost     0.96   
Iteration 1000: Cost     0.31   
Iteration 2000: Cost     0.30   
Iteration 3000: Cost     0.30   
Iteration 4000: Cost     0.30   
Iteration 5000: Cost     0.30   
Iteration 6000: Cost     0.30   
Iteration 7000: Cost     0.30   
Iteration 8000: Cost     0.30   
Iteration 9000: Cost     0.30   
Iteration 9999: Cost     0.30

Evaluating logistic regression - 评估逻辑回归

通过观察所学模型对训练集的预测效果来评估所发现参数的质量

为此，将执行下面的predict函数

Exercise 4

predict函数，在给定数据集和学习参数向量 $𝑤$ 和 $𝑏$ 的情况下，生成1或0的预测结果

首先，根据模型 $𝑓(𝑥^{(𝑖)})=𝑔(𝑤⋅𝑥^{(𝑖)}+𝑏)$ 计算每个例子的预测值
将模型的输出 $(𝑓(𝑥^{(𝑖)}))$ 解释为 $𝑦^{(𝑖)}=1$ 给定 $𝑥^{(𝑖)}$ 并以 $𝑤$ 为参数的概率
因此，要从逻辑回归模型中得到最终预测结果( $𝑦^{(𝑖)}=0$ or $𝑦(𝑖)=1$ $𝑦^{(𝑖)}=1$ )，可以使用以下启发式方法-

if $𝑓(𝑥^{(𝑖)})>=0.5$ , predict $𝑦^{(𝑖)}=1$

if $𝑓(𝑥^{(𝑖)})<0.5$ , predict $𝑦^{(𝑖)}=0$

def predict(X, w, b): 

    # 训练实例数
    m, n = X.shape   
    p = np.zeros(m)
   
    # 遍历每个示例
    for i in range(m):   
        z_wb = 0
        # 遍历每个特征
        for j in range(n): 
            # 将相应项加入 z_wb
            z_wb += X[i, j] * w[j]
        
        # 添加偏差项
        z_wb += b
        
        # 计算该示例的预测值
        f_wb = 1 / (1+np.exp(-z_wb))

        # 应用阈值
        p[i] = 1 if f_wb >= 0.5 else 0
         
    return p
    
# 测试预测代码
np.random.seed(1)
tmp_w = np.random.randn(2)
tmp_b = 0.3    
tmp_X = np.random.randn(4, 2) - 0.5

tmp_p = predict(tmp_X, tmp_w, tmp_b)
print(f'Output of predict: shape {tmp_p.shape}, value {tmp_p}')

Output of predict: shape (4,), value [0. 1. 1. 1.]

现在，用它来计算训练集的准确率

# 计算训练集的精确度
p = predict(X_train, w,b)
print('Train Accuracy: %f'%(np.mean(p == y_train) * 100))

Train Accuracy: 92.000000

PreviousLinear Regression NextRegularized Logistic Regression

Last updated 1 year ago

Was this helpful?