Skip to content

module__org.bibliome.alvisnlp.modules.tika.TikaReader

Robert Bossy edited this page Jul 27, 2017 · 1 revision

#org.bibliome.alvisnlp.modules.tika.TikaReader

Synopsis

Reads PDF or DOC files and adds a document in the corpus for each file.

This module is experimental.

Description

Parameters

Optional

Type: SourceStream

Path to the source directory or source file.

Optional

Type: Mapping

UNDOCUMENTED

Optional

Type: Mapping

Constant features to add to each document created by this module

Optional

Type: Mapping

Constant features to add to each section created by this module

Default value: html

Type: String

Default value: text

Type: String

Name of the single section containing the whole contents of a file.

Default value: tag

Type: String

Clone this wiki locally