Add docstrings to sentence splitters

flairNLP · Dec 4, 2024 · fd49827 · fd49827
1 parent 7cd659f
commit fd49827
Show file tree

Hide file tree

Showing 2 changed files with 31 additions and 7 deletions.
diff --git a/docs/tutorial/tutorial-basics/entity-mention-linking.md b/docs/tutorial/tutorial-basics/entity-mention-linking.md
@@ -1,6 +1,6 @@
 # Using and creating entity mention linker
 
-As of Flair 0.14 we ship the [entity mention linker](#flair.models.EntityMentionLinker) - the core framework behind the [Hunflair BioNEN approach](https://huggingface.co/hunflair)]. 
+As of Flair 0.14 we ship the [entity mention linker](#flair.models.EntityMentionLinker) - the core framework behind the [Hunflair BioNEN approach](https://huggingface.co/hunflair). 
 You can read more at the [Hunflair2 tutorials](project:../tutorial-hunflair2/overview.md)
 
 ## Example 1: Printing Entity linking outputs to console
@@ -124,5 +124,11 @@ print(result_mentions)
 
 ```{note}
   If you need more than the extracted ids, you can use `nen_tagger.dictionary[span_data["nen_id"]]`
-  to look up the [`flair.data.EntityCandidate`](#flair.data.EntityCandidate) which contains further information.
-```
+  to look up the [`EntityCandidate`](#flair.data.EntityCandidate) which contains further information.
+```
+
+### Next
+
+Congrats, you learned how to link biomedical entities with Flair! 
+
+Next, let's discuss how to [predict part-of-speech tags with Flair](part-of-speech-tagging.md).
diff --git a/flair/splitter.py b/flair/splitter.py
@@ -16,16 +16,34 @@ class SentenceSplitter(ABC):
     r"""An abstract class representing a :class:`SentenceSplitter`.
 
     Sentence splitters are used to represent algorithms and models to split plain text into
-    sentences and individual tokens / words. All subclasses should overwrite :meth:`splits`,
-    which splits the given plain text into a sequence of sentences (:class:`Sentence`). The
-    individual sentences are in turn subdivided into tokens / words. In most cases, this can
-    be controlled by passing custom implementation of :class:`Tokenizer`.
+    sentences and individual tokens / words. All subclasses should overwrite :func:`split`,
+    which splits the given plain text into a list of :class:`flair.data.Sentence` objects. The
+    individual sentences are in turn subdivided into tokens. In most cases, this can
+    be controlled by passing custom implementation of :class:`flair.tokenization.Tokenizer`.
 
     Moreover, subclasses may overwrite :meth:`name`, returning a unique identifier representing
     the sentence splitter's configuration.
+
+    The most common class in Flair that implements this base class is :class:`SegtokSentenceSplitter`.
     """
 
     def split(self, text: str, link_sentences: Optional[bool] = True) -> list[Sentence]:
+        """
+        Takes as input a text as a plain string and outputs a list of :class:`flair.data.Sentence` objects.
+
+        If link_sentences is set (by default, it is). The :class:`flair.data.Sentence` objects will include pointers
+        to the preceding and following sentences in the original text. This way, the original sequence information will
+        always be preserved.
+
+        Args:
+            text: The plain text to split.
+            link_sentences: If set to True, :class:`flair.data.Sentence` objects will include pointers
+                to the preceding and following sentences in the original text.
+
+        Returns:
+            A list of :class:`flair.data.Sentence` objects that each represent one sentence in the given text.
+
+        """
         sentences = self._perform_split(text)
         if not link_sentences:
             return sentences