Renard (
Character networks (i.e., graphs where vertices represent
+ characters and edges represent their relationships) extracted from
+ narrative texts are useful in a number of applications, from
+ visualization to literary analysis
+ (
Renard is a fully configurable pipeline that can extract static and
+ dynamic networks from narrative texts. We base Renard on the generic
+ character network extraction framework highlighted by the survey of
+ Labatut & Bost
+ (
-
+
Depending on the input text, the user can choose the most + relevant series of steps and configure each of them as needed. + Therefore, the pipeline can be specialized for different types of + texts, allowing for better performance.
+The pipeline can easily incorporate new advances in NLP, by + simply implementing a new step when necessary.
+One can study the impact of the performance of each step on the + quality of the extracted networks.
+We intend for Renard to be used by digital humanities researchers + as well as NLP researchers and practitioners. The former category of + users can use Renard to quickly extract character networks for + literary analysis. Meanwhile, the latter can use Renard to easily + represent textual content using networks, which can be used as inputs + for downstream NLP tasks (classification, recommendation…).
+Renard is centered about the concept of a
+
from renard.pipeline import Pipeline
+from renard.pipeline.tokenization import NLTKTokenizer
+from renard.pipeline.ner import NLTKNamedEntityRecognizer
+from renard.pipeline.character_unification import GraphRulesCharacterUnifier
+from renard.pipeline.graph_extraction import CoOccurrencesGraphExtractor
+
+with open("./my_doc.txt") as f:
+ text = f.read()
+
+pipeline = Pipeline(
+ [
+ NLTKTokenizer(),
+ NLTKNamedEntityRecognizer(),
+ GraphRulesCharacterUnifier(min_appearance=10),
+ # users can pass 'dynamic=True' and specify the
+ # 'dynamic_window' argument to extract a dynamic network
+ # instead of a static one.
+ CoOccurrencesGraphExtractor(
+ co_occurrences_dist=10, dynamic=False
+ )
+ ]
+)
+
+out = pipeline(text)
+ Co-occurrence character network of Jane Austen’s “Pride + and Prejudice”, extracted automatically using Renard. Vertex size + and color denote degree, while edge thickness and color denote the + number of co-occurrences between two characters.
As an example, Figure
+
To allow for custom needs, we design Renard to be very flexible. If + a step is not available in Renard, we encourage users to either:
+-
+
Externally perform the computation corresponding to the desired + step, and inject the results back into the pipeline at + runtime,
+Implement their own step to integrate their custom processing
+ into Renard by subclassing the existing
+
The flexibility of this approach introduces the possibility of
+ creating invalid pipelines because steps often require information
+ computed by previously run steps: for example, solving the NER task
+ requires a tokenized version of the input text. To counteract this
+ issue, each step therefore declares its requirements and the new
+ information it produces, which allows Renard to check whether a
+ pipeline is valid, and to explain at runtime to the user why it may
+ not be
Existing steps and their supported languages in Renard.
+
Step | +Supported Languages | +
---|---|
eng | +|
any | +|
eng, fra, rus, ita, spa… (12 other) | +|
any | +|
eng, rus | +|
eng, fra | +|
eng | +|
eng | +|
any | +|
eng, fra | +|
eng | +|
any | +|
any | +
Renard lets users select the targeted language of its their custom
+ pipelines. A pipeline can be configured to run in any language, as
+ long as each of its steps supports it. Table
+
See
+
See
+