Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Related #16 | Add Tree schema and CONSTITUENCY_PARSING task #295

Merged
merged 4 commits into from
Jan 9, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions seacrowd/utils/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from seacrowd.utils.schemas import (
image_text_features,
kb_features,
tree_features,
pairs_features,
pairs_features_score,
pairs_multi_features,
Expand Down Expand Up @@ -45,6 +46,9 @@ class Tasks(Enum):
COREFERENCE_RESOLUTION = "COREF"
SPAN_BASED_ABSA = "SPAN_ABSA"

# Tree
CONSTITUENCY_PARSING = "CONST_PAR"

# Single Text Classification
ASPECT_BASED_SENTIMENT_ANALYSIS = "ABSA"
EMOTION_CLASSIFICATION = "EC"
Expand Down Expand Up @@ -202,6 +206,7 @@ class Licenses(Enum):

TASK_TO_SCHEMA = {
Tasks.DEPENDENCY_PARSING: "KB",
Tasks.CONSTITUENCY_PARSING: "TREE",
Tasks.WORD_SENSE_DISAMBIGUATION: "T2T",
Tasks.WORD_ANALOGY: "T2T",
Tasks.KEYWORD_EXTRACTION: "SEQ_LABEL",
Expand Down Expand Up @@ -268,6 +273,7 @@ class Licenses(Enum):

SCHEMA_TO_FEATURES = {
"KB": kb_features,
"TREE": tree_features,
"QA": qa_features,
"T2T": text2text_features,
"TEXT": text_features(),
Expand Down
2 changes: 2 additions & 0 deletions seacrowd/utils/schemas/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from .image_text import features as image_text_features
from .kb import features as kb_features
from .tree import features as tree_features
from .pairs import features as pairs_features
from .pairs import features_with_continuous_label as pairs_features_score
from .pairs_multilabel import features as pairs_multi_features
Expand All @@ -19,6 +20,7 @@
__all__ = [
"image_text_features",
"kb_features",
"tree_features",
"pairs_features",
"pairs_features_score",
"pairs_multi_features",
Expand Down
35 changes: 35 additions & 0 deletions seacrowd/utils/schemas/tree.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
"""
Tree Schema

This schema assumes a document with subnodes elements
and a tree hierarchy.

For example:
NODE1 .....
//
ROOT - NODE2 .....
\\
NODE3 .....
"""
import datasets
Copy link
Collaborator

@sabilmakbar sabilmakbar Jan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @MJonibek, would you like to add an example of this schema using one compounded sentence on the docstring? This should be similar to ToD Schema (#237), since this schema is quite complex and less used compared to others, which can hampers the implementation using this schema.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, ok I will do it today.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I tried to do small example, because all sentences from alt_burmese_tree_bank dataset are big. I think now it is clear how to use this schema

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so the dataset construction from the example sentence will be a top-down approach then, I thought it was a bottom-up. Looks fine to me, thanks a lot for proposing + implementing this, @MJonibek!


features = datasets.Features(
{
"id": datasets.Value("string"),
"passage": {
"id": datasets.Value("string"),
"type": datasets.Value("string"),
"text": datasets.Sequence(datasets.Value("string")),
"offsets": datasets.Sequence(datasets.Value("int32")),
},
"nodes": [
{
"id": datasets.Value("string"),
"type": datasets.Value("string"),
"text": datasets.Value("string"),
"offsets": datasets.Sequence(datasets.Value("int32")),
"subnodes": datasets.Sequence(datasets.Value("string")), # ids of subnodes
}
],
}
)