Is there an open source web search library that doesn't use a search index file?

I'm looking for an open source web search library that doesn't use the search index file. Do you know anyone?

Thanks, Kenneth

0


source to share


3 answers


I think there is no one (at least it is popular enough for users to know about it).



We went further to code our own search engine.

0


source


The original poster clarified in a comment on this answer that what he is looking for is essentially a "greplike search but over HTTP" and mentioned that he is looking for something that uses a small disk when running an embedded system.

I am not aware of any related projects, but you can look at the html parsers and xquery implementations in the language of your choice. You should be able to take care of the "real" html mess with the former and write a search that is almost as detailed as you might want with the latter.

I am assuming that you will be working with a set of URLs that will either be provided or already stored locally, as the idea behind actually traversing the entire web, discovering links, etc. in an embedded device is completely unrealistic.



Though with a good html / xquery implementation you have the tools to fetch all links.

My original answer, which was indeed a request for clarification:

Not sure what you mean. How do you see a search without an index? Bypassing the internet for every request? Pipeline via Google? Or do you mean a specific type of search index file that you are trying to avoid?

+1


source


Do you mean:

search.cgi

#/bin/sh
arg=`echo $QUERY | sed -e 's/^s=//' -e 's/&.*$//'`
cd /var/www/httpd
find . -type f | xargs egrep -l "$arg" | awk 'BEGIN { 
        print "Content-type: text/html"; 
        print "";
        print "<HTML><HEAD><TITLE>Search Result</TITLE></HEAD>";
        print "<BODY><P>Here are your search results, sorry it took so long.</P>";
        print "<UL>";
    }
    { print  "<LI><A HREF=\"http://yourhost.com/" $1 "\">" $1 "</A></LI>"; }
    END {
        print "</UL></BODY>";
    }'

      

Not verified...

+1


source







All Articles