Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AM dataset - CDCP #15

Merged
merged 14 commits into from
Nov 9, 2023
Merged

Add AM dataset - CDCP #15

merged 14 commits into from
Nov 9, 2023

Conversation

idalr
Copy link
Collaborator

@idalr idalr commented Nov 2, 2023

As described in #10, we work on CDCP, an argumentation mining dataset.

Specifically for the CDCP dataset, we add:

  • PIE dataset builder: pie/cdcp.py
  • dataset cards: HF/README.md and pie/README.md files. EDIT: the HF dataset cards lives here
  • test file: tests/dataset_builders/pie/test_cdcp.py

Copy link
Owner

@ArneBinder ArneBinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

early feedback

EDIT: Please also remove dataset_builders/hf/cdcp again, this is not so important for now. But please still create the Huggingface dataset card for DFKI-SLT/cdcp at the Huggingface Hub.

tests/dataset_builders/pie/test_cdcp.py Outdated Show resolved Hide resolved
tests/dataset_builders/pie/test_cdcp.py Outdated Show resolved Hide resolved
tests/dataset_builders/pie/test_cdcp.py Outdated Show resolved Hide resolved
Copy link
Owner

@ArneBinder ArneBinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more feedback. There are still several files in the diff that shouldn't be listed there:

  • the poetry.lock should now be again the original
  • please remove the __init__.py files anywhere in dataset_builders
  • document/types.py: most of the classes are not used at all. For classes, that are just used in the tests, define them just there.

Copy link
Owner

@ArneBinder ArneBinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas for improvement and how-to increase test coverage

Background:

  • the test coverage should be as high as possible
  • source code lines that are new in this PR, but not covered by any test execution are marked by the Codecov Bot (box with yellow and named "Codecov / codecov/patch") in the diff view

dataset_builders/pie/cdcp/cdcp.py Outdated Show resolved Hide resolved
dataset_builders/pie/cdcp/cdcp.py Outdated Show resolved Hide resolved
dataset_builders/pie/cdcp/cdcp.py Outdated Show resolved Hide resolved
dataset_builders/pie/cdcp/cdcp.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Nov 7, 2023

Codecov Report

Merging #15 (4f74a05) into main (d887d18) will increase coverage by 0.39%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #15      +/-   ##
==========================================
+ Coverage   92.53%   92.93%   +0.39%     
==========================================
  Files          16       17       +1     
  Lines        1206     1274      +68     
==========================================
+ Hits         1116     1184      +68     
  Misses         90       90              
Files Coverage Δ
dataset_builders/pie/cdcp/cdcp.py 100.00% <100.00%> (ø)
src/pie_datasets/document/types.py 100.00% <100.00%> (ø)

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

Copy link
Owner

@ArneBinder ArneBinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a review just for the code for now. I will review the readmes separately.

dataset_builders/pie/cdcp/cdcp.py Outdated Show resolved Hide resolved
dataset_builders/pie/cdcp/cdcp.py Outdated Show resolved Hide resolved
dataset_builders/pie/cdcp/cdcp.py Outdated Show resolved Hide resolved
src/pie_datasets/document/types.py Outdated Show resolved Hide resolved
tests/dataset_builders/pie/test_cdcp.py Outdated Show resolved Hide resolved
Copy link
Owner

@ArneBinder ArneBinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback for the readmes.

dataset_builders/hf/README.md Outdated Show resolved Hide resolved
dataset_builders/hf/README.md Outdated Show resolved Hide resolved
dataset_builders/hf/README.md Outdated Show resolved Hide resolved
@ArneBinder ArneBinder merged commit fd0c954 into main Nov 9, 2023
4 checks passed
@ArneBinder ArneBinder deleted the add_dataset_cdcp branch November 9, 2023 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants