Below is a brief summary of this project:
- Implemented a text-based large-scale search engine with Lucene API on corpus of 500,000+ documents.
- Supported multiple retrieval models (unranked/ranked Boolean, Okapi BM25, Indri) and multiple operators.
- Realized pseudo-relevance feedback that smartly expands original queries to improve retrieval performance.
- Exploited machine learning ideas to implement Learning-to-Rank by calling SVMrank to train retrieval model.
- Enhanced diversity of search results by implicitly/explicitly considering multiple intents in the retrieval results.