Skip to content

Commit

Permalink
test ingestion pipeline
Browse files Browse the repository at this point in the history
  • Loading branch information
knmnyn committed Jul 5, 2024
1 parent daaf7c5 commit a8b372d
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 2 deletions.
3 changes: 1 addition & 2 deletions content/authors/alumnus/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,5 @@ highlight_name: false
user_groups:
- Alumni

---

---xw
An alumnus
27 changes: 27 additions & 0 deletions publications.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
@inproceedings{huang-etal-2022-lightweight,
title = "Lightweight Contextual Logical Structure Recovery",
author = "Huang, Po-Wei and
Ramesh Kashyap, Abhinav and
Qin, Yanxia and
Yang, Yajing and
Kan, Min-Yen",
editor = "Cohan, Arman and
Feigenblat, Guy and
Freitag, Dayne and
Ghosal, Tirthankar and
Herrmannova, Drahomira and
Knoth, Petr and
Lo, Kyle and
Mayr, Philipp and
Shmueli-Scheuer, Michal and
de Waard, Anita and
Wang, Lucy Lu",
booktitle = "Proceedings of the Third Workshop on Scholarly Document Processing",
month = oct,
year = "2022",
address = "Gyeongju, Republic of Korea",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.sdp-1.5",
pages = "37--48",
abstract = "Logical structure recovery in scientific articles associates text with a semantic section of the article. Although previous work has disregarded the surrounding context of a line, we model this important information by employing line-level attention on top of a transformer-based scientific document processing pipeline. With the addition of loss function engineering and data augmentation techniques with semi-supervised learning, our method improves classification performance by 10{\%} compared to a recent state-of-the-art model. Our parsimonious, text-only method achieves a performance comparable to that of other works that use rich document features such as font and spatial position, using less data without sacrificing performance, resulting in a lightweight training pipeline.",
}

0 comments on commit a8b372d

Please sign in to comment.