Clustering in Python-Image Clustering

Question

Clustering in Python-Image Clustering

I want to group images using K Means or some other algorithm (suggestion required).

The problem is this: I want to copy images into 3 clusters (nature, sunset, water). I loaded all the images using os.listdir () and then converted all images to arrays (RGB) and then created a data frame that contains three columns - ID, Image_array, Label.

Now when I use KMans clustering by providing n_clusters = 3 it shows this error:

from sklearn.cluster import KMeans kmeans = KMeans (n_clusters = 3) .fit (img_array) ERROR = An array with dim 4 found. Expected expected evaluator <= 2.

Now I need your help on this clustering issue. The dataframe I created looks like this:

img_array = []

path = "C://Users/shivam/Desktop/freelancer/p22/data/green_nature/"
for f in os.listdir('.'):
    if f.endswith('.jpg'):
        img = Image.open(f)
        data = np.asarray(img, dtype='uint8')
        img_array.append(data)


df = pd.DataFrame({'image_arrays':img_array})
df['id'] = range(1, len(df) + 1)

+3

python machine-learning cluster-analysis

Shivam panchal 03 Apr 17 at 13:32

source to share

2 answers

TheLaurens · Answer 1 · 2017-04-03T13:47:50+0000

Well, as you said, k-mean wants a vector per input, whereas you provide it with a 3D array per image. The easiest way to solve such a problem (which requires some creativity) is to develop a set of functions that can be recognized for the classes you have.

Since in this case you want to classify between nature (lot o "green"), water (lot o "blue" and "sunset" (lot o 'read / yellow / pink, perhaps?), You can use general or medium green blue and red values To check if the functions you have selected are discriminatory, you can plot a histogram.

go from your 4D array (image size x width x height) to a 2D array (image x in medium color). You need to take np.mean for different colors, height and width. At the end you should have an array (images x 3 (colors)).

Alianse777 · Answer 2 · 2017-04-03T13:57:03+0000

This is because you are missing a 4d array if a 2d is expected. 'img_array.shape' should be like this (n_samples, n_features). You need to use a data extraction algorithm.

This can be done using the scikit-image module. You need to convert your images to grayscale format. Code:

import skimage.feature as feature
img_converted = []
for i in range(len(img_array)):
    img_converted.append(feature.hog(img_array[i]))
model.fit(np.array(img_converted))

Documentation: http://scikit-image.org/docs/dev/api/skimage.feature.html#hog

Clustering in Python-Image Clustering

More articles: