Effect of pure white on a color histogram
I am trying to use the KNN method to classify people in races based on their facial photos. I have a dataset on a pure white background [255, 255, 255].
I use the color bar chart values as the feed input. I was told that I should remove the background color from the histogram to improve KNN performance.
Problem: When I create a mask from my photo that ignores the background, the histogram doesn't change one tiny bit.
Question: I'm not so into color theory that pure white affects the shape of the color histogram in general? When I use a regular mask that only centers (like in the image below), the histogram changes.
This is the mask I built on the picture, ignoring the background
Simple mask to check the correct application of the mask
Source image for histogram calculation
This is the histogram I get from the unmasked image and from my constructed mask, ignoring the white.
This is the histogram I get from cropping using my simple mask. histograms, so I believe my method of calculating the histogram is correct.
Code for calculating the histogram:
# loop over the image channels
for (chan, color) in zip(channels, colors):
# create a histogram for the current channel and
# concatenate the resulting histograms for each channel
hist_full = opencv.calcHist([chan], [0], mask, [bin_amount], [0, bin_amount])
# plot the histogram
plt.plot(hist_full, color=color)
plt.xlim([0, bin_amount])
plt.show()
The code to create the mask:
mask = np.zeros(image.shape[:2], np.uint8)
# simple mask option
# mask[75:175, 75:175] = 255
# create a mask to ignore white background in the histogram
for row in range(0, len(image)):
for col in range(0, len(image[0])):
if (image[row][col] != np.asarray(prop.background)).all():
try:
mask[row][col] = 255
except IndexError:
print(col)
source to share
See: http://docs.opencv.org/2.4/modules/imgproc/doc/histograms.html
An important part:
Python: cv2.calcHist (images, channels, mask, histSize, ranges [, hist [, accumulate]]) → hist
Parameters: ... ranges - an array of arrays of dims of histogram boundaries in each dimension. When the histogram is homogeneous (uniform = true), then for each measurement I just need to specify the lower (inclusive) border L_0 of the 0th histogram bin and the upper (exclusive) border U_ {\ texttt {histSize} [i] -1} for the last histogram binSize [i] -1.
Try to change this part of your code
hist_full = opencv.calcHist([chan], [0], mask, [bin_amount], [0, bin_amount])
in the following way
hist_full = opencv.calcHist([chan], [0], mask, [bin_amount], [0, 256])
You need to specify the range of real values in the image (excluding the upper bound). Chances are you are now only counting 0-63 values and ignoring 64-255 in your histogram.
source to share