Matlab: Color Based Segmentation
I previously worked on this script based on http://www.mathworks.com/products/demos/image/color_seg_k/ipexhistology.html and Matlab answers:
clc; clear; close all; input_im=imread('C:\Users\Udell\Desktop\T2.jpg'); sz_im=size(input_im); cform = makecform('srgb2lab'); lab_he = applycform(input_im,cform); ab = double(lab_he(:,:,2:3)); nrows = size(ab,1); ncols = size(ab,2); ab = reshape(ab,nrows*ncols,2); nColors = 3; % repeat the clustering 3 times to avoid local minima [cluster_idx, cluster_center] = kmeans(ab,nColors,'distance','sqEuclidean', 'Replicates',3); pixel_labels = reshape(cluster_idx,nrows,ncols); %imshow(pixel_labels,[]), title('image labeled by cluster index'); segmented_images = cell(1,3); rgb_label = repmat(pixel_labels,[1 1 3]); for k = 1:nColors color = input_im; color(rgb_label ~= k) = 0; segmented_images{k} = color; end for k=1:nColors %figure title_string=sprintf('objects in cluster %d',k); %imshow(segmented_images{k}), title(title_string); end finalSegmentedImage=segmented_images{1}; %imshow(finalSegmentedImage); close all; Icombine = [input_im finalSegmentedImage]; imshow(Icombine);
When running the time multiplication script, I noticed that I get different images when finalSegmentedImage = segmented_images {1} for the combined image (Icombine). What for? And how can I fix this so that the results will be duplicate (eg segmented_images {1} will always be the same)?
Many thanks.
Picture:
source to share
The reason you get different results is because your color segmentation algorithm uses k-mean clustering . I'm going to assume you don't know what it is, as someone familiar with how it works will tell you right off the bat which is why you get different results every time. In fact, the different results you get after running this code each time are a natural consequence of k - clustering, and I'll explain why.
How it works, for some data you have, you want to group it into k groups. First, you pick k random points in your data and they will have labels from 1,2,...,k
. This is what we call centroids . Then you determine how close the rest of the data is to each of these points. Then you group these points so that whichever point is closer to any of these k points, you assign those points to that particular group ( 1,2,...,k
). After that, for all points for each group, you update the centroids , which are actually defined as the representative point for each group. For each group, you compute the average of all points in each of the k groups. They become newcentroids for the next iteration. In the next iteration, you determine how close each point in your data is to each centroid . You keep repeating and repeating this behavior until the centroids no longer move, or they move very little.
As it relates to the above code, you are taking an image, and you want to represent the image using only k possible colors. Thus, each of these possible colors will be a centroid. Once you find out which cluster each pixel belongs to, you replace the color of the pixel with the centroid of the cluster to which the pixel belongs. Therefore, for each color pixel in your image, you want to decide which of the possible colors that pixel will be best represented. The reason this is color segmentation is because you are segmenting so that the image belongs to only k possible colors. This is, more generally, what is called unsupervised segmentation .
Now back to the k-value. How you choose the starting centroids is the reason why you get different results. You call the default k-means, which automatically determines which starting points the algorithm will choose. Because of this, you are not guaranteed to generate the same starting points every time you invoke the algorithm. If you want to repeat the same segmentation, no matter how many calls kmeans
you call , you need to specify the starting points yourself . Thus, you will need to change the k-mean call to look like this:
[cluster_idx, cluster_center] = kmeans(ab,nColors,'distance','sqEuclidean', ...
'Replicates', 3, 'start', seeds);
Note that the call is the same, but we added two additional parameters to the k-mean call. The flag start
means you are specifying the starting points and seeds
is an array k x p
where k is how many groups you want. In this case it is the same as, nColors
which is 3. p
is the size of your data. Due to how you transform and reformat your data, it will be 2. So you ultimately define the matrix 3 x 2
. However, you have a flag Replicate
. This means that the k-means algorithm will run a certain number of times as specified by you and it will output the sharding with the fewest errors. Thus, we will repeat the calls kmeans
as many times as indicated by this flag. The above structureseeds
will no longer be k x p
, but k x p x n
, where n
is the number of times you want to start segmentation. It is now a 3D matrix where each 2D slice defines the starting points for each run of the algorithm. Remember this later.
How you choose these items is up to you. However, if you want to randomly select them and not leave them behind, but want to reproduce the same results every time you call this function, you must set the random seed generator to be a known number, for example 123
. This way, when you generate random points, it will always generate the same sequence of points, and hence playable. So I would add this to your code before the call kmeans
.
rng(123); %// Set seed for reproducibility
numReplicates = 3;
ind = randperm(size(ab,1), numReplicates*nColors); %// Randomly choose nColors colours from data
%// We are also repeating the experiment numReplicates times
%// Make a 3D matrix where each slice denotes the initial centres for each iteration
seeds = permute(reshape(ab(ind,:).', [2 nColors numReplicates]), [2 1 3]);
%// Now call kmeans
[cluster_idx, cluster_center] = kmeans(ab,nColors,'distance','sqEuclidean', ...
'Replicates', numReplicates, 'start', seeds);
Remember that you specified a flag Replicates
and we want to repeat this algorithm a certain number of times. This is 3
. Therefore, we need to specify starting points for each run of the algorithm . Since we are going to have 3 clusters of points, and we will run this algorithm 3 times, we only need 9 starting points (or nColors * numReplicates
). Each set of origin points must be a slice in a 3D array, which is why you see this complex statement before calling kmeans
.
I made the replicate count as a variable so you can change that and your heart content and it will still work. A tricky statement with permute
and reshape
allows us to create this 3D dot matrix very easily.
Be aware that a call randperm
in MATLAB only recently accepted a second parameter. If the above call randperm
doesn't work, do this instead:
rng(123); %// Set seed for reproducibility
numReplicates = 3;
ind = randperm(size(ab,1)); %// Randomly choose nColors colours from data
ind = ind(1:numReplicates*nColors); %// We are also repeating the experiment numReplicates times
%// Make a 3D matrix where each slice denotes the initial centres for each iteration
seeds = permute(reshape(ab(ind,:).', [2 nColors numReplicates]), [2 1 3]);
%// Now call kmeans
[cluster_idx, cluster_center] = kmeans(ab,nColors,'distance','sqEuclidean', ...
'Replicates', numReplicates, 'start', seeds);
Now with the above code, you should generate the same color segmentation results every time.
Good luck!
source to share