您当前的位置: 首页 >  算法

09 机器学习 - Kmeans聚类算法案例

杨林伟 发布时间:2019-09-27 15:26:02 ,浏览量:2

1. 需求

对给定的数据集进行聚类

本案例采用二维数据集,共80个样本,有4个类。样例如下(testSet.txt):

1.658985 	4.285136
-3.453687	3.424321
4.838138	    -1.151539
-5.379713	-3.362104
0.972564	    2.924086
-3.567919	1.531611
0.450614  	-3.302219
-3.487105	-1.724432
2.668759 	1.594842
-3.156485	3.191137
3.165506 	-3.999838
-2.786837	-3.099354
4.208187 	2.984927
-2.123337	2.943366
0.704199 	-0.479481
-0.392370	-3.963704
2.831667 	1.574018
-0.790153	3.343144
2.943496 	-3.357075
2. python代码实现 2.1 利用numpy手动实现
from numpy import *
#加载数据
def loadDataSet(fileName):
    dataMat = []
    fr = open(fileName)
    for line in fr.readlines():
        curLine = line.strip().split('\t')
        fltLine = map(float, curLine)    #变成float类型
        dataMat.append(fltLine)
    return dataMat

# 计算欧几里得距离
def distEclud(vecA, vecB):
    return sqrt(sum(power(vecA - vecB, 2)))

#构建聚簇中心,取k个(此例中为4)随机质心
def randCent(dataSet, k):
    n = shape(dataSet)[1]
    centroids = mat(zeros((k,n)))   #每个质心有n个坐标值,总共要k个质心
    for j in range(n):
        minJ = min(dataSet[:,j])
        maxJ = max(dataSet[:,j])
        rangeJ = float(maxJ - minJ)
        centroids[:,j] = minJ + rangeJ * random.rand(k, 1)
    return centroids

#k-means 聚类算法
def kMeans(dataSet, k, distMeans =distEclud, createCent = randCent):
    m = shape(dataSet)[0]
    clusterAssment = mat(zeros((m,2)))    #用于存放该样本属于哪类及质心距离
    centroids = createCent(dataSet, k)
    clusterChanged = True
    while clusterChanged:
        clusterChanged = False;
        for i in range(m):
            minDist = inf; minIndex = -1;
            for j in range(k):
                distJI = distMeans(centroids[j,:], dataSet[i,:])
                if distJI             
关注
打赏
1688896170
查看更多评论
0.0829s