How to Calculate "Average Accuracy and Ranking" for a CBIR System
So a basic cbir system was implemented for me using RGB histograms. Now I am trying to create average precision and estimation curves. I need to know if my formula is correct for avg precision? and how to calculate the average rating?
Code:
% Dir: parent directory location for images folder c1, c2, c3
% inputImage: \c1\1.ppm
% For example to get P-R curve execute: CBIR('D:\visionImages','\c2\1.ppm');
function [ ] = demoCBIR( Dir,inputImage)
% Dir='D:\visionImages';
% inputImage='\c3\1.ppm';
tic;
S=strcat(Dir,inputImage);
Inp1=imread(S);
num_red_bins = 8;
num_green_bins = 8;
num_blue_bins = 8;
num_bins = num_red_bins*num_green_bins*num_blue_bins;
A = imcolourhist(Inp1, num_red_bins, num_green_bins, num_blue_bins);%input image histogram
srcFiles = dir(strcat(Dir,'\*.jpg'));
B = zeros(num_bins, 100); % hisogram of other 100 images in category 1
ptr=1;
for i = 1 : length(srcFiles)
filename = strcat(Dir,'\',srcFiles(i).name);
I = imread(filename);% filter image
B(:,ptr) = imcolourhist(I, num_red_bins, num_green_bins, num_blue_bins);
ptr=ptr+1;
end
%normal histogram intersection
a = size(A,2); b = size(B,2);
K = zeros(a, b);
for i = 1:a
Va = repmat(A(:,i),1,b);
K(i,:) = 0.5*sum(Va + B - abs(Va - B));
end
sims=K;
for i=1: 100 % number of relevant images for dir 1
relevant_IDs(i) = i;
end
num_relevant_images = numel(relevant_IDs);
[sorted_sims, locs] = sort(sims, 'descend');
locations_final = arrayfun(@(x) find(locs == x, 1), relevant_IDs);
locations_sorted = sort(locations_final);
precision = (1:num_relevant_images) ./ locations_sorted;
recall = (1:num_relevant_images) / num_relevant_images;
% generate Avg precision
avgprec=sum(precision)/num_relevant_images;% avg precision formula
plot(avgprec, 'b.-');
xlabel('Category ID');
ylabel('Average Precision');
title('Average Precision Plot');
axis([0 10 0 1.05]);
end
source to share
Yes, that's right. You just add all of your precision values and average them. This is the very definition of average accuracy.
Average precision is just one number (usually a percentage) that gives you the overall performance of the image search engine. The higher the value, the better the performance. Precision-Recall plots give you more detailed insight into how the system works, but medium precision is useful when you are comparing many image search engines together. Instead of building many PR graphs to try to compare the overall performance of many search engines, you can simply have a table that compares all systems together with one number that determines the performance of each, namely the average accuracy.
Also, it doesn't make sense to compose the average accuracy. When average accuracy is usually reported in scientific articles, there is no plot ... just one value! The only way I could see you doing this is by having a histogram, where the axis y
indicates the average precision and the axis x
indicates which search engine you are comparing. The higher the bar, the better the accuracy. However, a table showing all the different search engines, each with its own average precision, is more than adequate. This is what is usually done in most CBIR research work.
To solve another question, you calculate the average rank using the average precision. Calculate the average precision for all of your search engines you are testing, then sort based on that average precision. Systems that have higher average accuracy will score higher.
source to share
This is what we use to calculate the average precision. There has to be a randomization step because you may have problems if you give discrete estimates of images in case of links if your images of truth on earth are at the top.
function ap = computeAP(label, score, gt)
rand_index = randperm(length(label));
label2 = label(rand_index);
score = score(rand_index);
[~, sids] = sort(score, 'descend');
label2 = label2(sids);
ids = find(label2 == gt);
ap = 0;
for j = 1:length(ids)
ap = ap + j / (ids(j) * length(ids));
end
fprintf('%f \n', ap);
end
source to share