Skip to content

Commit

Permalink
Add docstrings to sentence splitters
Browse files Browse the repository at this point in the history
  • Loading branch information
alanakbik committed Dec 4, 2024
1 parent 7cd659f commit fd49827
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 7 deletions.
12 changes: 9 additions & 3 deletions docs/tutorial/tutorial-basics/entity-mention-linking.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Using and creating entity mention linker

As of Flair 0.14 we ship the [entity mention linker](#flair.models.EntityMentionLinker) - the core framework behind the [Hunflair BioNEN approach](https://huggingface.co/hunflair)].
As of Flair 0.14 we ship the [entity mention linker](#flair.models.EntityMentionLinker) - the core framework behind the [Hunflair BioNEN approach](https://huggingface.co/hunflair).
You can read more at the [Hunflair2 tutorials](project:../tutorial-hunflair2/overview.md)

## Example 1: Printing Entity linking outputs to console
Expand Down Expand Up @@ -124,5 +124,11 @@ print(result_mentions)

```{note}
If you need more than the extracted ids, you can use `nen_tagger.dictionary[span_data["nen_id"]]`
to look up the [`flair.data.EntityCandidate`](#flair.data.EntityCandidate) which contains further information.
```
to look up the [`EntityCandidate`](#flair.data.EntityCandidate) which contains further information.
```

### Next

Congrats, you learned how to link biomedical entities with Flair!

Next, let's discuss how to [predict part-of-speech tags with Flair](part-of-speech-tagging.md).
26 changes: 22 additions & 4 deletions flair/splitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,34 @@ class SentenceSplitter(ABC):
r"""An abstract class representing a :class:`SentenceSplitter`.
Sentence splitters are used to represent algorithms and models to split plain text into
sentences and individual tokens / words. All subclasses should overwrite :meth:`splits`,
which splits the given plain text into a sequence of sentences (:class:`Sentence`). The
individual sentences are in turn subdivided into tokens / words. In most cases, this can
be controlled by passing custom implementation of :class:`Tokenizer`.
sentences and individual tokens / words. All subclasses should overwrite :func:`split`,
which splits the given plain text into a list of :class:`flair.data.Sentence` objects. The
individual sentences are in turn subdivided into tokens. In most cases, this can
be controlled by passing custom implementation of :class:`flair.tokenization.Tokenizer`.
Moreover, subclasses may overwrite :meth:`name`, returning a unique identifier representing
the sentence splitter's configuration.
The most common class in Flair that implements this base class is :class:`SegtokSentenceSplitter`.
"""

def split(self, text: str, link_sentences: Optional[bool] = True) -> list[Sentence]:
"""
Takes as input a text as a plain string and outputs a list of :class:`flair.data.Sentence` objects.
If link_sentences is set (by default, it is). The :class:`flair.data.Sentence` objects will include pointers
to the preceding and following sentences in the original text. This way, the original sequence information will
always be preserved.
Args:
text: The plain text to split.
link_sentences: If set to True, :class:`flair.data.Sentence` objects will include pointers
to the preceding and following sentences in the original text.
Returns:
A list of :class:`flair.data.Sentence` objects that each represent one sentence in the given text.
"""
sentences = self._perform_split(text)
if not link_sentences:
return sentences
Expand Down

0 comments on commit fd49827

Please sign in to comment.