Image Similarity - Deep Learning and Handmade Features
I am doing computer vision research and I am working on a problem related to finding visually similar images with the request image. For example, find T-shirts of a similar color with similar patterns (Striped / Checkered) or shoes of the same color and shape, etc.
I learned the features of the image by hand such as color histograms, texture functions, shape functions (oriented gradient histogram), SIFT, etc. I also read the literature on deep neural networks (convolutional neural networks), which have been trained on massive amounts of data and are currently the most advanced in the field of image classification.
I was wondering if the same features (pulled from CNN) could also be used for my project - finding fine-grained similarities between images. From what I understand, CNN has learned good representational functions that can help categorize images - for example, whether it's a red shirt or a blue shirt or an orange shirt, it can determine that an image is a shirt. However, he does not understand that the orange shirt looks more like a red shirt than a blue shirt, and therefore she is unable to capture these similarities.
Please correct me if I am wrong. I would like to know if there are any deep neural networks that capture these similarities and, as it turns out, are superior to hand-crafted functions. Thanks in advance.
source to share
For your task, CNN is definitely worth a try!
Many researchers have used pre-trained networks for image classification and have obtained the most up-to-date results for fine-grained classification. For example, when trying to classify species of birds or cars.
Now your task is not a classification, but it is related. You can think of similarity as some geometric distance between objects, which are mostly vectors. So you can do some experimentation by calculating the distance between the feature vectors for all your training images (link) and the feature vector extracted from the query image.
The CNN features extracted from the early layers of the web should be more related to color or other graphical cues rather than more "semantic" ones.
Alternatively, there is some work directly exploring the similarity metric via CNN, for example here .
source to share
A bit outdated, but it might be useful to other people. Yes, CNN can be used for image similarity and I've used that before. As Flavio pointed out, for a simple launch, you can use a pre-rendered CNN of your choice like Alexnet, GoogleNet, etc. and then use it as a highlighting tool. You can compare functions based on distance, similar images will have a smaller distance between their feature vectors.
source to share