How to set up Solr for a one-to-many relationship

I am developing a search application using Solr, which is required to search for "books" divided into chapters. The book might look like this:

title: "book title"
author: "mr whoever"
chapters: [
    {
        title: "some chapter title"
        text: "blah blah blah"
    },
    {
        title: "some other title"
        text: "blah blah blah"
    },
    ... etc.
]

      

Search requirements:

  • The user is looking for books, not chapters, so the best results should be most relevant overall, given all the chapter text inside.

  • The user should see which chapters from the book matched, information about those chapters, and how many matched each chapter.

results mockup


Progress:

Multivalued fields

Solr supports multi-valued fields (i.e. multiple chapters for each book), but it is not possible to have two fields (title and text) for each field in a book document.

Solr "Join"

I don't know if this is necessary. Each chapter will only belong to one book, so it seems like we could just fit them into one document without much repetition.

Dynamic fields

Fields like "chapter1text_txt", "chapter1title_txt" and "chapter2text_txt" for example, and only combine information about each chapter independent of solr, so solr doesn't know that "chapter1text_txt", "chapter1title_txt" are part of the same thing ...

What is the correct way to set up schema.xml to support and find this document type?

+3


source to share


1 answer


Document structure

The best solution so far has been to use multi-valued fields for chapter_title

both and to chapter_text

ensure that these values ​​are sequentially ordered in the load documents, so the first chapter_title

always matches the first chapter_text

, etc.

Here's the schema.xml section:

<field name="report_title"
       type="text_en" indexed="true" stored="true"/>

<field name="chapter_title"
       type="text_en" indexed="true" stored="true" multiValued="true"/>

<field name="chapter_text"
       type="text_en" indexed="true" stored="true" multiValued="true"/>

      



This is a trade-off because the index cannot know about this relationship between chapter_title

and chapter_text

, so it is not possible to query for "chapters with X in the title and Y in the text".

Number of coincidences

I still haven't found a way to do this, but I'm considering highlighting and counting the number of highlighted terms after requesting one large chunk covering the entire document.

0


source







All Articles