(Photo by Alina Grubnyak on Unsplash)
Developed a mini Search Engine in Java and created the following components from scratch using standard Java libraries:
- Webserver Framework
- Key-Value Store
- MapReduce Engine
- Web Crawler
- Indexer
- PageRank
Deployed a cluster of master and worker nodes on AWS EC2s, and parallelized crawling, indexing and storage. Leveraged EBS for storing crawled pages and processed indexer and PageRank results.