Lucene.NET - indexing one large file> 1 GB
I have one XML file that I want to index using Lucene.NET. The file is basically a large collection of logs. Since one file itself is over 5GB and I am developing code on a system with 2GB of RAM, how can I do the indexing when I am not parsing the file and creating no other fields other than "text" that the file should contain data ?
I am using some code from CodeClimber and am not sure at the moment what would be the best approach for indexing such a large single file,
Is there a way to pass the file data to the index in chunks? Below is a line of code that basically creates a textbox and associated file data.
Document doc = new Document(); doc.Add(new Field("Body", text, Field.Store.YES, Field.Index.TOKENIZED)); writer.AddDocument(doc);
Thanks for the guidance
You should use something like
that doesn't load all xml into memory. But indexing the entire xml as a single document does not make sense as you will get one or one document with each search. (Found or not found). So, to be able to pass data in chunks, it wouldn't help you. Therefore, when reading your XML file, you must split it into many documents (and fields) so that you can get reasonable search results.
how can i do indexing when i don't parse the file and create no other fields besides "text" which should contain file data
What a wonderful world it will be
source to share