You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Alternatively, the WARC bolt could add WARC file name, record offset and length to the metadata. An indexer (CDX or anything else) then could store it directly which obsoletes the need to index the CDX files in a separate step.
StormCrawler allows to filter web pages and archive them into WARC archives, as follows:
Would it be possible to create a CDX index (or JCDX index) for the WARC archives at the same time?
The text was updated successfully, but these errors were encountered: