Matlab: clustering kmeans gives unexpected clusters
Example:
load kmeansdata %provides X variable
Y=bsxfun(@minus,X,mean(X,2))'/sqrt(size(X,2)-1); %normalized and means adjusted
[~,~,PC] = svd(Y); %
plot(PC(:,1),PC(:,2),'m.','markersize',15)
build the first two columns and you get what looks like 3 clusters. I want to identify these clusters using kmeans and draw clusters of different colors as prood. I tried:
[idx,cntrd] = kmeans(PC(:,1:2),3,'Distance','sqEuclidean');%,'Distance','correlation');
cluster=3;
Col = {'.b','.r','.g','.y','.m','.c','.k'}; % Cell array of colours.
figure;
hold on
for clus=1:cluster
plot(PC(idx==clus,1),PC(idx==clus,2),Col{clus},'MarkerSize',12)
end
plot(cntrd(:,1),cntrd(:,2),'kx','MarkerSize',15,'LineWidth',3) %plotting the centroids of the clusters
The cluster centroids are off and the colors are not what I expected. Can anyone please help?
EDIT: Multiple answers:
I copied this code from mathworks site and replaced my kmeans line:
opts = statset('Display','final');
[idx,C] = kmeans(PC(:,1:2),3,'Distance','cityblock',...
'Replicates',5,'Options',opts);
it works, but I don't quite understand what opts does. Replicates, I suppose, just repeats kmeans 5 times and picks some average for the centroids. I also restarted matlab if any crash occurred
EDIT: ignore the above:
I thought the problem was solved, so I tried to find suitable k values. I entered k = 1, ran through everything, then k = 2, then k = 3, and I noticed that I got the same error again.
source to share
kmeans can be sensitive to initial centroid locations. The problem is that the algorithm is used to select the starting points. for example, you can get the expected response by doing this:
[idx,cntrd] = kmeans(PC(:,1:2),3, 'start', [-0.05 0; 0 0; 0.05 0]);
The looks can also be deceiving. In this case, the variance of the data is not equal in x and y dimensions. Thus, for some pairs of points, the Euclidean distance is not as far from visual clusters as in clusters.
For this data, you can use a mixture of Guassian distribution model.
source to share