【机器学习】K-Means

发布时间 2023-07-31 15:51:21作者: 码农要战斗

K-Means

找最接近的质心

公式

\[c^{(i)} := j \quad \mathrm{that \; minimizes} \quad ||x^{(i)} - \mu_j||^2 \]

其中,范式\(||X||\),其计算公式为

\[||X|| = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2} \]

代码

# UNQ_C1
# GRADED FUNCTION: find_closest_centroids

def find_closest_centroids(X, centroids):
    """
    Computes the centroid memberships for every example
    
    Args:
        X (ndarray): (m, n) Input values      
        centroids (ndarray): k centroids
    
    Returns:
        idx (array_like): (m,) closest centroids
    
    """

    # Set K
    K = centroids.shape[0]

    # You need to return the following variables correctly
    idx = np.zeros(X.shape[0], dtype=int)

    ### START CODE HERE ###
    for i in range(len(idx)):
        distance = []
        for j in range(K):
            norm_ij = np.linalg.norm(X[i] - centroids[j]) # 求范数(即距离)
            distance.append(norm_ij)

        idx[i] = np.argmin(distance)
    ### END CODE HERE ###
    
    return idx

计算质心平均值

公式

\[\mu_k = \frac{1}{|C_k|} \sum_{i \in C_k} x^{(i)} \]

代码

# UNQ_C2
# GRADED FUNCTION: compute_centpods

def compute_centroids(X, idx, K):
    """
    Returns the new centroids by computing the means of the 
    data points assigned to each centroid.
    
    Args:
        X (ndarray):   (m, n) Data points
        idx (ndarray): (m,) Array containing index of closest centroid for each 
                       example in X. Concretely, idx[i] contains the index of 
                       the centroid closest to example i
        K (int):       number of centroids
    
    Returns:
        centroids (ndarray): (K, n) New centroids computed
    """
    
    # Useful variables
    m, n = X.shape
    
    # You need to return the following variables correctly
    centroids = np.zeros((K, n))
    ### START CODE HERE ###
    for k in range(K):
        points = X[idx == k]
        centroids[k] = np.mean(points, axis=0)
    ### END CODE HERE ## 
    
    return centroids