How to set up Solr for a one-to-many relationship
I am developing a search application using Solr, which is required to search for "books" divided into chapters. The book might look like this:
title: "book title"
author: "mr whoever"
chapters: [
{
title: "some chapter title"
text: "blah blah blah"
},
{
title: "some other title"
text: "blah blah blah"
},
... etc.
]
Search requirements:
-
The user is looking for books, not chapters, so the best results should be most relevant overall, given all the chapter text inside.
-
The user should see which chapters from the book matched, information about those chapters, and how many matched each chapter.
Progress:
Multivalued fields
Solr supports multi-valued fields (i.e. multiple chapters for each book), but it is not possible to have two fields (title and text) for each field in a book document.
Solr "Join"
I don't know if this is necessary. Each chapter will only belong to one book, so it seems like we could just fit them into one document without much repetition.
Dynamic fields
Fields like "chapter1text_txt", "chapter1title_txt" and "chapter2text_txt" for example, and only combine information about each chapter independent of solr, so solr doesn't know that "chapter1text_txt", "chapter1title_txt" are part of the same thing ...
What is the correct way to set up schema.xml to support and find this document type?
Document structure
The best solution so far has been to use multi-valued fields for chapter_title
both and to chapter_text
ensure that these values ββare sequentially ordered in the load documents, so the first chapter_title
always matches the first chapter_text
, etc.
Here's the schema.xml section:
<field name="report_title"
type="text_en" indexed="true" stored="true"/>
<field name="chapter_title"
type="text_en" indexed="true" stored="true" multiValued="true"/>
<field name="chapter_text"
type="text_en" indexed="true" stored="true" multiValued="true"/>
This is a trade-off because the index cannot know about this relationship between chapter_title
and chapter_text
, so it is not possible to query for "chapters with X in the title and Y in the text".
Number of coincidences
I still haven't found a way to do this, but I'm considering highlighting and counting the number of highlighted terms after requesting one large chunk covering the entire document.