Where can I find the search corpus?

I am interested in preparing a question-answering system that is dependent on custom search queries, but so far it does not seem to be available. Are there any research centers or industry laboratories that have collected search query corpuses?

+3


source to share


2 answers


There are several such datasets:

Yahoo Weboscope: - http://webscope.sandbox.yahoo.com/catalog.php?datatype=l



Yandex Datasets: - https://www.kaggle.com/c/yandex-personalized-web-search-challenge/data Part of the Kaggle problem. You can register and download.

There are also AOL Query Logs and MSN Query Logs that have been published as part of a general task over the past 10 years. I'm not sure if they are still open. However, you can explore a little.

+3


source


Weboscope / Kaggle datasets have certain limitations. I would suggest a TREC dataset like this dataset from 2009



+1


source







All Articles