-
Notifications
You must be signed in to change notification settings - Fork 25
Home
Andy Jackson edited this page Aug 18, 2015
·
20 revisions
The primary goal of this project to provide full-text search for our web archives. To achieve this, the warc-indexer component is used to parse the (W)ARC files and, for each resource, it posts a record into one or more Apache Solr servers. We then use client facing tools that allow researchers to query the Solr index and explore the collections.
- Quick Start
- Overview
- Source Code Project Structure
- Roadmap
- Features
- Configuration
- Front-ends
- Similar Systems
- Dataset Generation
- IIPC Solr Workshop (Jan 2014)
The schedule for this event is here.
-
Getting started with webarchive-discovery (Quick Start)
- Setting up a test Solr service
- Indexing your ARCs or WARCs
- Browsing the results, basic queries, the schema browser, etc.
- Using the Solr UI:
- Installing SolrCloud (towards a production environment)
- Benchmarking and performance analysis