Rails / Sunspot / Solr: repeated indexing on inherited classes
We are creating a Ruby on Rails application that uses Solr as a search engine. The following version numbers may be relevant to the issue described in the following paragraphs:
- Ruby: 1.9.2
- Rails: 3.2.6
- Sunspot: 1.3.0.rc5
Background
We have a model Feedback
that is inherited by various subclasses. The class hierarchy looks like this (unidirectional inheritance):
Feedback
|- Problem
|- Question
|- Suggestion
|- Announcement
In the index, Feedback
indexing is enabled with the following code:
searchable :auto_index => true, :auto_remove => true do
string :type
text :title, :boost => 2
text :content
integer :user_id
time :created_at
...
end
Problem
The problem is that when creating, for example, a new one Problem
with the title "problemtitle", Sunspot will initiate automatic indexing for Problem
and the base one Feedback
. When searching for reviews with the title "problemtitle" with
search = Feedback.solr_search do
with(:type, type.capitalize)
fulltext("problemtitle") {minimum_match 1}
paginate(page: options[:page], per_page: options[:per_page])
end
found two results. One of the results is Problem
and the other is Feedback
. This indicates that the class and its subclasses are indexed in the class hierarchy; which should be correct as far as I know.
The strange thing is that reindexing the index with the command bundle exec rake sunspot:solr:reindex
and searching Feedback
with the title "problemtitle" produces the same result as above Problem
.
We solved this by adding :unless => proc {|model| model.class == Feedback}
to the search definition in the model Feedback
. This ensures that only subclasses are Feedback
automatically indexed.
Question
My question is what is the desired behavior or not (is it a feature or a bug). I don't understand why reindexing treats models for indexing differently than automatic indexing at create time. Could this be a problem with how we implemented the class hierarchy?
If more information is required to answer my question, I will try to give it.
Regards,
Sebastian
source to share
We solved this problem by expanding the search block with the except statement:
searchable :auto_index => true, :auto_remove => true,
:unless => proc {|model| model.class == Feedback} do
string :type
text :title, :boost => 2
text :content
integer :user_id
time :created_at
...
end
end
source to share
Sebastian, I believe the problem is that Sunspot creates the main Solr id using the fully qualified class name and id:
def index_id_for(class_name, id) #:nodoc:
"#{class_name} #{id}"
end
So if your class is indexed as Feedback
, then again as Feedback::Problem
Solr will have two entries for it and thus return both of them when searched. Sunspot will then try to match each item to the database by pulling the same item twice. When reindexing, the entire database is discarded and each item is indexed with its current class - which is why there is only one after the reindex.
We had a similar problem and the solution was to create our own InstanceAdapter
for the STI classes and register it in the initializer:
class StiInstanceAdapter < Sunspot::Adapters::InstanceAdapter
def id
@instance.id
end
def index_id
return Sunspot::Adapters::InstanceAdapter.index_id_for(@instance.class.base_class.name, id)
end
end
Sunspot::Adapters::InstanceAdapter.register(StiInstanceAdapter, Feedback)
I know this is a little late, but hopefully it helps.
source to share