diff --git a/joss.06574/10.21105.joss.06574.crossref.xml b/joss.06574/10.21105.joss.06574.crossref.xml new file mode 100644 index 0000000000..98ff4f8efd --- /dev/null +++ b/joss.06574/10.21105.joss.06574.crossref.xml @@ -0,0 +1,224 @@ + + + + 20240620144142-a5b692e03c91b2ade10a23ece9eede46e017d654 + 20240620144142 + + JOSS Admin + admin@theoj.org + + The Open Journal + + + + + Journal of Open Source Software + JOSS + 2475-9066 + + 10.21105/joss + https://joss.theoj.org + + + + + 06 + 2024 + + + 9 + + 98 + + + + Renard: A Modular Pipeline for Extracting Character +Networks from Narrative Texts + + + + Arthur + Amalvy + https://orcid.org/0000-0003-4629-0923 + + + Vincent + Labatut + https://orcid.org/0000-0002-2619-2835 + + + Richard + Dufour + https://orcid.org/0000-0003-1203-9108 + + + + 06 + 20 + 2024 + + + 6574 + + + 10.21105/joss.06574 + + + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + + + + Software archive + 10.5281/zenodo.12167900 + + + GitHub review issue + https://github.com/openjournals/joss-reviews/issues/6574 + + + + 10.21105/joss.06574 + https://joss.theoj.org/papers/10.21105/joss.06574 + + + https://joss.theoj.org/papers/10.21105/joss.06574.pdf + + + + + + Analysis of a play by means of CHAPLIN, the +characters and places interaction network software + Sparavigna + International Journal of +Sciences + 3 + 4 + 10.18483/ijSci.662 + 2015 + Sparavigna, A. C., & Marazzato, +R. (2015). Analysis of a play by means of CHAPLIN, the characters and +places interaction network software. International Journal of Sciences, +4(3), 60–68. https://doi.org/10.18483/ijSci.662 + + + Charnetto + Métrailler + 2023 + Métrailler, C. (2023). Charnetto. +https://gitlab.com/maned_wolf/charnetto + + + Extraction and analysis of fictional +character networks : A survey + Labatut + ACM Computing Surveys + 52 + 10.1145/3344548 + 2019 + Labatut, V., & Bost, X. (2019). +Extraction and analysis of fictional character networks : A survey. ACM +Computing Surveys, 52, 89. +https://doi.org/10.1145/3344548 + + + Structural analysis on social network +constructed from characters in literature texts + Park + Journal of Computers + 8 + 10.4304/jcp.8.9.2442-2447 + 2013 + Park, G., Kim, S., & Cho, H. +(2013). Structural analysis on social network constructed from +characters in literature texts. Journal of Computers, 8. +https://doi.org/10.4304/jcp.8.9.2442-2447 + + + Complex system analysis of social networks +extracted from literary fictions + Park + International Journal of Machine Learning and +Computing + 10.7763/IJMLC.2013.V3.282 + 2013 + Park, G., Kim, S., Hwang, H., & +Cho, H. (2013). Complex system analysis of social networks extracted +from literary fictions. International Journal of Machine Learning and +Computing, 107–111. +https://doi.org/10.7763/IJMLC.2013.V3.282 + + + Les réseaux de personnages de science-fiction +: Échantillons de lectures intermédiaires + Rochat + ReS Futurae + 10 + 10.4000/resf.1183 + 2017 + Rochat, Y., & Triclot, M. (2017). +Les réseaux de personnages de science-fiction : Échantillons de lectures +intermédiaires. ReS Futurae, 10, 1183. +https://doi.org/10.4000/resf.1183 + + + Character networks and +centrality + Rochat + 2014 + Rochat, Y. (2014). Character networks +and centrality [PhD thesis, Université de Lausanne]. +https://serval.unil.ch/resource/serval:BIB_663137B68131.P001/REF.pdf + + + Character network analysis of Émile Zola’s +Les Rougon-Macquart + Rochat + Digital humanities 2015 + 2015 + Rochat, Y. (2015). Character network +analysis of Émile Zola’s Les Rougon-Macquart. Digital Humanities 2015. +https://infoscience.epfl.ch/record/210573?ln=en + + + Exploring network structure, dynamics and +function using NetworkX + Hagbert + 7th python in science conference +(SciPy2008) + 2008 + Hagbert, A. A., Schult, D. A., & +Swart, P. J. (2008). Exploring network structure, dynamics and function +using NetworkX. 7th Python in Science Conference (SciPy2008). +https://www.osti.gov/biblio/960616 + + + Mr. Bennet, his coachman, and the archbishop +walk into a bar but only one of them gets recognized: On the difficulty +of detecting characters in literary texts + Vala + Conference on empirical methods in natural +language processing + 10.18653/v1/D15-1088 + 2015 + Vala, H., Jurgens, D., Piper, A., +& Ruths, D. (2015). Mr. Bennet, his coachman, and the archbishop +walk into a bar but only one of them gets recognized: On the difficulty +of detecting characters in literary texts. Conference on Empirical +Methods in Natural Language Processing, 769–774. +https://doi.org/10.18653/v1/D15-1088 + + + + + + diff --git a/joss.06574/10.21105.joss.06574.pdf b/joss.06574/10.21105.joss.06574.pdf new file mode 100644 index 0000000000..592c0a7dcb Binary files /dev/null and b/joss.06574/10.21105.joss.06574.pdf differ diff --git a/joss.06574/paper.jats/10.21105.joss.06574.jats b/joss.06574/paper.jats/10.21105.joss.06574.jats new file mode 100644 index 0000000000..93ad23fe9e --- /dev/null +++ b/joss.06574/paper.jats/10.21105.joss.06574.jats @@ -0,0 +1,485 @@ + + +
+ + + + +Journal of Open Source Software +JOSS + +2475-9066 + +Open Journals + + + +6574 +10.21105/joss.06574 + +Renard: A Modular Pipeline for Extracting Character +Networks from Narrative Texts + + + +https://orcid.org/0000-0003-4629-0923 + +Amalvy +Arthur + + + + +https://orcid.org/0000-0002-2619-2835 + +Labatut +Vincent + + + + +https://orcid.org/0000-0003-1203-9108 + +Dufour +Richard + + + + + +Laboratoire Informatique d’Avignon, France + + + + +Laboratoire des Sciences du Numérique de Nantes, +France + + + + +29 +2 +2024 + +9 +98 +6574 + +Authors of papers retain copyright and release the +work under a Creative Commons Attribution 4.0 International License (CC +BY 4.0) +2022 +The article authors + +Authors of papers retain copyright and release the work under +a Creative Commons Attribution 4.0 International License (CC BY +4.0) + + + +Python +character networks +pipeline +nlp + + + + + + Summary +

Renard (Relationships Extraction from NARrative + Documents) is a Python library that allows users to define + custom natural language processing (NLP) pipelines to extract + character networks from narrative texts. Contrary to the few existing + tools, Renard can extract dynamic networks, as well + as the more common static networks. Renard pipelines are modular: + users can choose the implementation of each NLP subtask needed to + extract a character network. This allows users to specialize pipelines + to particular types of texts and to study the impact of each subtask + on the extracted network.

+
+ + Statement of Need +

Character networks (i.e., graphs where vertices represent + characters and edges represent their relationships) extracted from + narrative texts are useful in a number of applications, from + visualization to literary analysis + (Labatut + & Bost, 2019). There are different ways of modeling + relationships (co-occurrences, conversations, actions, etc.), and + networks can be static or dynamic (i.e., series of networks + representing the evolution of relationships through time). This + variety means one can extract different kinds of networks depending on + the targeted applications. While some authors extract these networks + by relying on manually annotated data + (Park, + Kim, & Cho, 2013; + Park, + Kim, Hwang, et al., 2013; + Rochat, + 2014, + 2015; + Rochat + & Triclot, 2017), it is a time-costly endeavor, and the + fully automatic extraction of these networks is therefore of interest. + Unfortunately, there are only a few existing software packages and + tools that can extract character networks + (Métrailler, + 2023; + Sparavigna + & Marazzato, 2015), but none of these can output dynamic + networks. Furthermore, automatically extracting a character network + requires solving several successive natural language processing tasks, + such as named entity recognition (NER) or coreference resolution, and + algorithms carrying these tasks are bound to make errors. To our + knowledge, the cascading impact of these errors on the quality of the + extracted networks has yet to be studied extensively. This is an + important issue since knowing which tasks have more influence on the + extracted networks would allow prioritizing research efforts.

+

Renard is a fully configurable pipeline that can extract static and + dynamic networks from narrative texts. We base Renard on the generic + character network extraction framework highlighted by the survey of + Labatut & Bost + (2019). + We design it so that it is as modular as possible, which allows the + user to select the implementation of each extraction step as needed. + This has several advantages:

+ + +

Depending on the input text, the user can choose the most + relevant series of steps and configure each of them as needed. + Therefore, the pipeline can be specialized for different types of + texts, allowing for better performance.

+
+ +

The pipeline can easily incorporate new advances in NLP, by + simply implementing a new step when necessary.

+
+ +

One can study the impact of the performance of each step on the + quality of the extracted networks.

+
+
+

We intend for Renard to be used by digital humanities researchers + as well as NLP researchers and practitioners. The former category of + users can use Renard to quickly extract character networks for + literary analysis. Meanwhile, the latter can use Renard to easily + represent textual content using networks, which can be used as inputs + for downstream NLP tasks (classification, recommendation…).

+
+ + Design and Main Features +

Renard is centered about the concept of a + pipeline. In Renard, a pipeline is a series of + sequential steps that are run one after the other in + order to extract a character network from a text. When using Renard, + the user simply describes this pipeline in Python by + specifying this series of steps, and can apply it to different texts + afterwards. The following code block exemplifies that philosophy:

+ from renard.pipeline import Pipeline +from renard.pipeline.tokenization import NLTKTokenizer +from renard.pipeline.ner import NLTKNamedEntityRecognizer +from renard.pipeline.character_unification import GraphRulesCharacterUnifier +from renard.pipeline.graph_extraction import CoOccurrencesGraphExtractor + +with open("./my_doc.txt") as f: + text = f.read() + +pipeline = Pipeline( + [ + NLTKTokenizer(), + NLTKNamedEntityRecognizer(), + GraphRulesCharacterUnifier(min_appearance=10), + # users can pass 'dynamic=True' and specify the + # 'dynamic_window' argument to extract a dynamic network + # instead of a static one. + CoOccurrencesGraphExtractor( + co_occurrences_dist=10, dynamic=False + ) + ] +) + +out = pipeline(text) + +

Co-occurrence character network of Jane Austen’s “Pride + and Prejudice”, extracted automatically using Renard. Vertex size + and color denote degree, while edge thickness and color denote the + number of co-occurrences between two characters.

+ +
+

As an example, Figure + [fig:pp_network] + shows the co-occurrence character network of Jane Austen’s 1813 novel + “Pride and Prejudice”, extracted using the Renard pipeline above. + While this network is static, users can also extract a dynamic network + by passing the dynamic=True argument to the + last step of the pipeline, and specifying the + dynamic_window argument: in that case, Renard + outputs a list of graphs corresponding to a dynamic network instead of + a single network1. Renard uses + the NetworkX Python library + (Hagbert + et al., 2008) to manipulate graphs, ensuring compatibility with + a wide array of tools and formats.

+

To allow for custom needs, we design Renard to be very flexible. If + a step is not available in Renard, we encourage users to either:

+ + +

Externally perform the computation corresponding to the desired + step, and inject the results back into the pipeline at + runtime,

+
+ +

Implement their own step to integrate their custom processing + into Renard by subclassing the existing + PipelineStep class. If necessary, this + PipelineStep can act as an adapter to an + external process that may or may not be written in Python.

+
+
+

The flexibility of this approach introduces the possibility of + creating invalid pipelines because steps often require information + computed by previously run steps: for example, solving the NER task + requires a tokenized version of the input text. To counteract this + issue, each step therefore declares its requirements and the new + information it produces, which allows Renard to check whether a + pipeline is valid, and to explain at runtime to the user why it may + not be2.

+ + +

Existing steps and their supported languages in Renard. +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
StepSupported Languages
StanfordCoreNLPPipelineeng
CustomSubstitutionPreprocessorany
NLTKTokenizereng, fra, rus, ita, spa… (12 other)
QuoteDetectorany
NLTKNamedEntityRecognizereng, rus
BertNamedEntityRecognizereng, fra
BertCoreferenceResolvereng
SpacyCorefereeCoreferenceResolvereng
NaiveCharacterUnifierany
GraphRulesCharacterUnifier + (inspired from Vala et al. + (2015))eng, fra
BertSpeakerDetectoreng
CoOccurencesGraphExtractorany
ConversationalGraphExtractorany
+
+

Renard lets users select the targeted language of its their custom + pipelines. A pipeline can be configured to run in any language, as + long as each of its steps supports it. Table + 1 shows all the currently + available steps in Renard and their supported languages.

+
+ + + + + + + + SparavignaA. C. + MarazzatoR. + + Analysis of a play by means of CHAPLIN, the characters and places interaction network software + International Journal of Sciences + 2015 + 4 + 3 + 10.18483/ijSci.662 + 60 + 68 + + + + + + MétraillerC. + + Charnetto + 2023 + https://gitlab.com/maned_wolf/charnetto + + + + + + LabatutV. + BostX. + + Extraction and analysis of fictional character networks : A survey + ACM Computing Surveys + 2019 + 52 + 10.1145/3344548 + 89 + + + + + + + ParkG. + KimS. + ChoH. + + Structural analysis on social network constructed from characters in literature texts + Journal of Computers + 2013 + 8 + 10.4304/jcp.8.9.2442-2447 + + + + + + ParkG. + KimS. + HwangH. + ChoH. + + Complex system analysis of social networks extracted from literary fictions + International Journal of Machine Learning and Computing + 2013 + 10.7763/IJMLC.2013.V3.282 + 107 + 111 + + + + + + RochatY. + TriclotM. + + Les réseaux de personnages de science-fiction : Échantillons de lectures intermédiaires + ReS Futurae + 2017 + 10 + 10.4000/resf.1183 + 1183 + + + + + + + RochatY. + + Character networks and centrality + Université de Lausanne + 201412 + https://serval.unil.ch/resource/serval:BIB_663137B68131.P001/REF.pdf + + + + + + RochatY. + + Character network analysis of Émile Zola’s Les Rougon-Macquart + Digital humanities 2015 + 2015 + https://infoscience.epfl.ch/record/210573?ln=en + + + + + + HagbertA. A. + SchultD. A. + SwartP. J. + + Exploring network structure, dynamics and function using NetworkX + 7th python in science conference (SciPy2008) + 2008 + https://www.osti.gov/biblio/960616 + + + + + + ValaH. + JurgensD. + PiperA. + RuthsD. + + Mr. Bennet, his coachman, and the archbishop walk into a bar but only one of them gets recognized: On the difficulty of detecting characters in literary texts + Conference on empirical methods in natural language processing + 2015 + 10.18653/v1/D15-1088 + 769 + 774 + + + + + +

See + the + documentation on dynamic networks for more details.

+
+ +

See + the + documentation for more details on steps requirements.

+
+
+
+
diff --git a/joss.06574/paper.jats/pp.pdf b/joss.06574/paper.jats/pp.pdf new file mode 100644 index 0000000000..746a7910db Binary files /dev/null and b/joss.06574/paper.jats/pp.pdf differ