-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #7 from bill-baumgartner/master
Updated coreference resolution annotations
- Loading branch information
Showing
138 changed files
with
634,501 additions
and
1,233,315 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,43 +1,19 @@ | ||
# The Colorado Richly Annotated Full-Text (CRAFT) Corpus | ||
|
||
The contents of this repository consist of the v3.1 release of the CRAFT Corpus. This release consists of 67 articles from the PubMed Central Open Access subset, each of which has been annotated along a number of different axes. Please see the [CRAFT Wiki](https://github.com/UCDenver-ccp/CRAFT/wiki) for further details on the corpus distribution. | ||
This repository contains the CRAFT corpus, a collection of 67 articles from the PubMed Central Open Access subset, each of which has been annotated along a number of different axes spanning structural, coreference, and concept annotation. | ||
|
||
### Concept annotation | ||
Concepts mentioned in these articles have been mapped (“normalized”) to specific ontology classes, relying on ten Open Biomedical Ontologies. For additional details see this [README](https://github.com/UCDenver-ccp/CRAFT/blob/master/concept-annotation/README.md). | ||
### Citing CRAFT | ||
To cite the CRAFT corpus, please see the [CRAFT Reference](https://github.com/UCDenver-ccp/CRAFT/wiki/Primary-references-for-the-CRAFT-corpus) wiki page. | ||
|
||
_For details of the concept annotations and citation, please see:_<br/> | ||
Bada, M., Eckert, M., Evans, D., Garcia, K., Shipley, K., Sitnikov, D., Baumgartner Jr., W. A., Cohen, K. B., Verspoor, K., Blake, J. A., and Hunter, L. E. (2012) Concept annotation in the CRAFT corpus. _BMC Bioinformatics_ 12:161. | ||
[[link](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-161)] | ||
### Using CRAFT | ||
For installation and other usage instructions, please see the [CRAFT Wiki](https://github.com/UCDenver-ccp/CRAFT/wiki). | ||
|
||
_For an overview of the concept annotation guidelines, please see:_<br/> | ||
Bada, M., Eckert, M., Palmer, M., and Hunter, L.E. (2010) An overview of the CRAFT annotation guidelines. _Proceedings of the Fourth Linguistic Annotation Workshop_, ACL 2010, pp. 207-211. | ||
[[link](http://www.aclweb.org/anthology/W10-1833)] | ||
### Stable releases | ||
For stable releases, please download from the [CRAFT Releases](https://github.com/UCDenver-ccp/CRAFT/releases) page. If you are participating in the [CRAFT Shared Task](https://sites.google.com/view/craft-shared-task-2019/home), please download [CRAFT v3.1](https://github.com/UCDenver-ccp/CRAFT/releases/tag/3.1). | ||
|
||
_For details of the Uberon anatomical annotations, please see:_<br/> | ||
Bada, M., Vasilevsky, N., Baumgartner Jr., W.A., Haendel, M., and Hunter, L.E. (2017) Gold-standard ontology-based anatomical annotation in the CRAFT Corpus. _Database_, Volume 2017, bax087. | ||
[[link](https://academic.oup.com/database/article/doi/10.1093/database/bax087/4780291)] | ||
### Creating alternative file formats | ||
The distribution has been streamlined to include only a single file format for each annotation type. In place of multiple file formats for each annotation type, the CRAFT corpus is distributed with a script which can convert annotations from the native file format into a variety of other file formats. Please see the [Creating alternative annotation file formats](https://github.com/UCDenver-ccp/CRAFT/wiki/Alternative-annotation-file-formats) wiki page for details. | ||
|
||
_For evaluation of concept recognition tools on the concept annotations, please see:_<br/> | ||
Funk, C., Baumgartner, W.A., Garcia, B., Roeder, C., Bada, M., Cohen, K.B., Hunter, L.E., and Verspoor, K. (2014) Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. _BMC Bioinformatics_ 15:59. | ||
[[link](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-59)] | ||
|
||
|
||
### Coreference annotation | ||
The corpus has been annotated with coreference relations, including identity and appositives, for all coreferring base noun phrases. | ||
|
||
_For details of the coreference annotations, please see:_<br/> | ||
Cohen, K.B., Lanfranchi, A., Choi, M.J., Bada, M., Baumgartner Jr., W.A., Panteleyeva, N., Verspoor, K., Palmer, M., and Hunter, L.E. (2017) Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles. _BMC Bioinformatics_ 18:372. | ||
[[link](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1775-9)] | ||
|
||
|
||
### Structural annotation | ||
All sentences have been marked up with respect to sentence segmentation, tokenization, part-of-speech tags, grammatical dependency, and treebanking. Document section boundaries and typography (e.g., italics, boldface, subscript, superscript) have also been extracted from the source document files. | ||
|
||
_The following article explores syntactic tool performance over CRAFT:_<br/> | ||
Verspoor, K., Cohen, K.B., Lanfranchi, A., Warner, C., Johnson, H.L., Roeder, C., Choi, J.D., Funk, C., Malenkiy, Y., Eckert, M., Xue, N., Baumgartner Jr., W.A., Bada, M., Palmer, M., Hunter L.E. (2012) A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. _BMC Bioinformatics_ 13:207. | ||
[[link](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-207)] | ||
|
||
|
||
## Feedback | ||
### Feedback | ||
|
||
Please direct comments, questions, and suggestions to the Issues section of the CRAFT GitHub page, or send e-mail to Mike Bada at [email protected]. |
Oops, something went wrong.