Finding Filters with Lucene.NET

I am using Lucene.Net to create a website to search for books, articles, etc. stored in PDF format. For example, I need to be able to filter search results based on author name. Can this be done with Lucena? Or do I need a DB to store filter fields for each document?

Also what is the best way to index my documents? I'll have about 50 documents to start with, and occasionally I'll have to add a bunch of documents to the index - maybe via a web form. Should I use a DB to store document paths?

Thank.

+2


source to share


2 answers


Here's a list of what you need to do IMO:



  • Extract the source from the PDF - see this question which recommends iTextSharp for this purpose.
  • For each PDF document, create a Lucene.net document that has multiple fields: author, title, document text and whatever you want to find. It is also recommended to have a unique identifier field for each document. I suggest that you also save the field with the path to the original PDF document.
  • After indexing all documents, you will have a Lucene index that you can search by fields.
  • You can add new documents by repeating step 2. It's easier to do offline - incremental updates are hard.
+2


source


Lucene has several different Analyzers that can wash out noise and make a "stem", which is useful when you want to do full text searches, but you still have to store the PDF somewhere. Lucene.Net is happy to create an index on the filesystem and you can add a field to the document it creates called "PATH" with the path to the document.



+2


source







All Articles