多元线性回归

1、目标：

扩展我们的回归模型例程以支持多种功能
扩展数据结构以支持多种功能
重写预测、成本和梯度例程以支持多种功能
利用NumPy np.dot对其实现进行矢量化，以提高速度和简洁性

import copy, math
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')
np.set_printoptions(precision=2)  # reduced display precision on numpy arrays

2、问题陈述

您将使用房价预测的激励示例。
训练数据集包含三个示例，下表显示了四个特征（大小、卧室、楼层和年龄）。
请注意，与早期的实验室不同，大小以平方英尺为单位，而不是 1000 平方英尺。这会导致一个问题，您将在下一个实验中解决该问题！

面积（平方英尺）	卧室数量	层数	家庭年龄	价格
2401	5	1	45	460
1416	3	2	40	232
852	2	1	35	178

你将使用这些值建立一个线性回归模型，这样你就可以预测其他房子的价格。
例如，一栋1200平方英尺、3间卧室、1层楼、40年历史的房子。
请运行以下代码单元来创建X_train和y_train变量。

X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

2.1、包含我们示例的矩阵X

与上表类似，示例存储在NumPy矩阵X_train中。
矩阵的每一行代表一个例子。
当你有? 培训示例（? 在我们的例子中是三），并且有?特征（在我们的示例中为四个），
? 是一个具有维（? , ?）（m行，n列）。
?(?)是包含示例i的向量，?(?)=(?(?)0,?(?)1,⋯,?(?)?−1)
?(?)? 是示例i中的元素j。括号中的上标表示示例编号，而下标表示元素。
显示输入数据。

#数据存储在numpy数组/矩阵中
print(f"X Shape: {X_train.shape}, X Type:{type(X_train)})")
print(X_train)
print(f"y Shape: {y_train.shape}, y Type:{type(y_train)})")
print(y_train)

X Shape: (3, 4), X Type:<class 'numpy.ndarray'>)
[[2104    5    1   45]
 [1416    3    2   40]
 [ 852    2    1   35]]
y Shape: (3,), y Type:<class 'numpy.ndarray'>)
[460 232 178]

2.2、参数向量 w， b

w是一个向量 ?元素。
每个元素都包含与一个要素关联的参数。
在我们的数据集中，n 是 4。
从概念上讲，我们将其绘制为列向量。

?是一个标量参数。

# 为了演示，?和?将加载一些接近最优值的初始选定值。?是一个一维数字向
b_init = 785.1811367994083
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])
print(f"w_init shape: {w_init.shape}, b_init type: {type(b_init)}")

w_init shape: (4,), b_init type: <class 'float'>

3、多变量模型预测

线性模型给出了具有多个变量的模型预测、矢量表示法：

3.1 、逐个元素的单一预测

我们之前的预测将一个特征值乘以一个参数，并添加一个偏差参数。
我们之前对多个特征的预测实现的直接扩展是实现上面的（1）在每个元素上使用循环，使用其参数执行乘法，然后在末尾添加偏差参数。

def predict_single_loop(x, w, b):
    """
    single predict using linear regression

    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters
      b (scalar):  model parameter

    Returns:
      p (scalar):  prediction
    """
    n = x.shape[0]
    p = 0
    for i in range(n):
        p_i = x[i] * w[i]
        p = p + p_i
    p = p + b
    return p

# get a row from our training data
x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")

# make a prediction
f_wb = predict_single_loop(x_vec, w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")

x_vec shape (4,), x_vec value: [2104    5    1   45]
f_wb shape (), prediction: 459.9999976194083

注意x_vec的形状。它是一个一维NumPy向量，包含4个元素（4,）。
结果f_wb是一个标量。

3.2、单一预测，向量

请注意，上述等式（1）可以使用上述（2）中的点积来实现。
我们可以利用向量运算来加速预测。

def predict(x, w, b):
    """
    single predict using linear regression
    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters
      b (scalar):             model parameter
    Returns:
      p (scalar):  prediction
    """
    p = np.dot(x, w) + b
    return p

#从我们的训练数据中获取一行
x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")
#作出预测
f_wb = predict(x_vec,w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")

x_vec shape (4,), x_vec value: [2104    5    1   45]
f_wb shape (), prediction: 459.9999976194083

结果和形状与之前使用循环的版本相同。
np.dot将用于这些操作。
大多数例程将直接实现它，而不是调用单独的预测例程。

4、多变量的计算成本

多变量成本函数的方程?(?,?)：

def compute_cost(X, y, w, b):
    """
    compute cost
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters
      b (scalar)       : model parameter

    Returns:
      cost (scalar): cost
    """
    m = X.shape[0]
    cost = 0.0
    for i in range(m):
        f_wb_i = np.dot(X[i], w) + b           #(n,)(n,) = scalar (see np.dot)
        cost = cost + (f_wb_i - y[i])**2       #scalar
    cost = cost / (2 * m)                      #scalar
    return cost

# Compute and display cost using our pre-chosen optimal parameters.
cost = compute_cost(X_train, y_train, w_init, b_init)
print(f'Cost at optimal w : {cost}')

Cost at optimal w : 1.5578904428966628e-12

预期成果：最佳成本 w ： 1.5578904045996674e-12

5、具有多个变量的梯度下降

5.1、使用多个变量计算梯度

def compute_gradient(X, y, w, b):
    """
    Computes the gradient for linear regression
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters
      b (scalar)       : model parameter

    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w.
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b.
    """
    m,n = X.shape           #(number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):
        err = (np.dot(X[i], w) + b) - y[i]   
        for j in range(n):                         
            dj_dw[j] = dj_dw[j] + err * X[i, j]    
        dj_db = dj_db + err                        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m                                
        
    return dj_db, dj_dw

#Compute and display gradient
tmp_dj_db, tmp_dj_dw = compute_gradient(X_train, y_train, w_init, b_init)
print(f'dj_db at initial w,b: {tmp_dj_db}')
print(f'dj_dw at initial w,b: \n {tmp_dj_dw}')

dj_db at initial w,b: -1.6739251501955248e-06
dj_dw at initial w,b: 
 [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]

预期成果：初始 w，b 时的dj_db：-1.6739251122999121e-06 初始 w，b 时的
dj_dw：

[-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]

5.2、多变量梯度下降

下面的例程实现了上面的等式（5）。

def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters):
"""

执行批量梯度下降以学习θ。通过获取更新theta带学习率alpha的num_iters梯度步长

X（ndarray（m，n））：数据，具有n个特征的m个示例
y（ndarray（m，））：目标值
w_in（ndarray（n，））：初始模型参数
b_in（标量）：初始模型参数
cost_function：计算成本的函数
gradient_function：计算梯度的函数
alpha（浮动）：学习率
num_iters（int）：运行梯度下降的迭代次数

退货：
w（ndarray（n，））：更新的参数值
b（标量）：参数的更新值

"""

	# An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in

    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_db,dj_dw = gradient_function(X, y, w, b)   ##None

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               ##None
        b = b - alpha * dj_db               ##None

        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(X, y, w, b))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")

    return w, b, J_history #return final w,b and J history for graphing

#初始化参数
initial_w = np.zeros_like(w_init)
initial_b = 0.
#一些梯度下降设置
iterations = 1000
alpha = 5.0e-7
#运行梯度下降
w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,
                                                    compute_cost, compute_gradient, 
                                                    alpha, iterations)
print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")
m,_ = X_train.shape
for i in range(m):
    print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")

Iteration    0: Cost  2529.462023-03-22 15:42:47 星期三
Iteration  100: Cost   695.99
Iteration  200: Cost   694.92
Iteration  300: Cost   693.86
Iteration  400: Cost   692.81
Iteration  500: Cost   691.77
Iteration  600: Cost   690.73
Iteration  700: Cost   689.71
Iteration  800: Cost   688.70
Iteration  900: Cost   687.69
b,w found by gradient descent: -0.00,[ 0.2   0.   -0.01 -0.07] 
prediction: 426.19, target value: 460
prediction: 286.17, target value: 232
prediction: 171.47, target value: 178

预期成果：通过梯度下降得到的 b，w：-0.00，[ 0.2 0. -0.01 -0.07]
预测： 426.19，目标值：460 预测：286.17，目标值：232
预测：171.47，目标值：178

#绘图成本与迭代
fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12, 4))
ax1.plot(J_hist)
ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])
ax1.set_title("Cost vs. iteration");  ax2.set_title("Cost vs. iteration (tail)")
ax1.set_ylabel('Cost')             ;  ax2.set_ylabel('Cost')
ax1.set_xlabel('iteration step')   ;  ax2.set_xlabel('iteration step')
plt.show()