Skip to content

Commit

Permalink
Merge pull request #7 from bill-baumgartner/master
Browse files Browse the repository at this point in the history
Updated coreference resolution annotations
  • Loading branch information
bill-baumgartner authored Apr 8, 2019
2 parents 1e806c2 + 03b7300 commit 83c2012
Show file tree
Hide file tree
Showing 138 changed files with 634,501 additions and 1,233,315 deletions.
2 changes: 1 addition & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

* Some erroneous relations were removed from a single knowtator-2 annotation file for the CL+extension concepts

* Coreference annotations in the Knowtator-2 file format have been removed due to an issue with their representation of APPOS relations. These files can be regenerated with the Clojure Boot script (referenced above) if desired.
* The coreference annotations have been revised to resolve instances of identity chains sharing mentions. The original knowtator files have been removed and replaced with knowtator-2 format files that contain the revised annotations. For details on the changes to the coreference annotations, please see this [README](https://github.com/UCDenver-ccp/CRAFT/blob/master/coreference-annotation/README.md).

* The distribution now includes XSD files for the knowtator and knowtator-2 XML file formats. See the **schema/** directory

Expand Down
44 changes: 10 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,19 @@
# The Colorado Richly Annotated Full-Text (CRAFT) Corpus

The contents of this repository consist of the v3.1 release of the CRAFT Corpus. This release consists of 67 articles from the PubMed Central Open Access subset, each of which has been annotated along a number of different axes. Please see the [CRAFT Wiki](https://github.com/UCDenver-ccp/CRAFT/wiki) for further details on the corpus distribution.
This repository contains the CRAFT corpus, a collection of 67 articles from the PubMed Central Open Access subset, each of which has been annotated along a number of different axes spanning structural, coreference, and concept annotation.

### Concept annotation
Concepts mentioned in these articles have been mapped (“normalized”) to specific ontology classes, relying on ten Open Biomedical Ontologies. For additional details see this [README](https://github.com/UCDenver-ccp/CRAFT/blob/master/concept-annotation/README.md).
### Citing CRAFT
To cite the CRAFT corpus, please see the [CRAFT Reference](https://github.com/UCDenver-ccp/CRAFT/wiki/Primary-references-for-the-CRAFT-corpus) wiki page.

_For details of the concept annotations and citation, please see:_<br/>
Bada, M., Eckert, M., Evans, D., Garcia, K., Shipley, K., Sitnikov, D., Baumgartner Jr., W. A., Cohen, K. B., Verspoor, K., Blake, J. A., and Hunter, L. E. (2012) Concept annotation in the CRAFT corpus. _BMC Bioinformatics_ 12:161.
[[link](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-161)]
### Using CRAFT
For installation and other usage instructions, please see the [CRAFT Wiki](https://github.com/UCDenver-ccp/CRAFT/wiki).

_For an overview of the concept annotation guidelines, please see:_<br/>
Bada, M., Eckert, M., Palmer, M., and Hunter, L.E. (2010) An overview of the CRAFT annotation guidelines. _Proceedings of the Fourth Linguistic Annotation Workshop_, ACL 2010, pp. 207-211.
[[link](http://www.aclweb.org/anthology/W10-1833)]
### Stable releases
For stable releases, please download from the [CRAFT Releases](https://github.com/UCDenver-ccp/CRAFT/releases) page. If you are participating in the [CRAFT Shared Task](https://sites.google.com/view/craft-shared-task-2019/home), please download [CRAFT v3.1](https://github.com/UCDenver-ccp/CRAFT/releases/tag/3.1).

_For details of the Uberon anatomical annotations, please see:_<br/>
Bada, M., Vasilevsky, N., Baumgartner Jr., W.A., Haendel, M., and Hunter, L.E. (2017) Gold-standard ontology-based anatomical annotation in the CRAFT Corpus. _Database_, Volume 2017, bax087.
[[link](https://academic.oup.com/database/article/doi/10.1093/database/bax087/4780291)]
### Creating alternative file formats
The distribution has been streamlined to include only a single file format for each annotation type. In place of multiple file formats for each annotation type, the CRAFT corpus is distributed with a script which can convert annotations from the native file format into a variety of other file formats. Please see the [Creating alternative annotation file formats](https://github.com/UCDenver-ccp/CRAFT/wiki/Alternative-annotation-file-formats) wiki page for details.

_For evaluation of concept recognition tools on the concept annotations, please see:_<br/>
Funk, C., Baumgartner, W.A., Garcia, B., Roeder, C., Bada, M., Cohen, K.B., Hunter, L.E., and Verspoor, K. (2014) Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. _BMC Bioinformatics_ 15:59.
[[link](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-59)]


### Coreference annotation
The corpus has been annotated with coreference relations, including identity and appositives, for all coreferring base noun phrases.

_For details of the coreference annotations, please see:_<br/>
Cohen, K.B., Lanfranchi, A., Choi, M.J., Bada, M., Baumgartner Jr., W.A., Panteleyeva, N., Verspoor, K., Palmer, M., and Hunter, L.E. (2017) Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles. _BMC Bioinformatics_ 18:372.
[[link](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1775-9)]


### Structural annotation
All sentences have been marked up with respect to sentence segmentation, tokenization, part-of-speech tags, grammatical dependency, and treebanking. Document section boundaries and typography (e.g., italics, boldface, subscript, superscript) have also been extracted from the source document files.

_The following article explores syntactic tool performance over CRAFT:_<br/>
Verspoor, K., Cohen, K.B., Lanfranchi, A., Warner, C., Johnson, H.L., Roeder, C., Choi, J.D., Funk, C., Malenkiy, Y., Eckert, M., Xue, N., Baumgartner Jr., W.A., Bada, M., Palmer, M., Hunter L.E. (2012) A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. _BMC Bioinformatics_ 13:207.
[[link](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-207)]


## Feedback
### Feedback

Please direct comments, questions, and suggestions to the Issues section of the CRAFT GitHub page, or send e-mail to Mike Bada at [email protected].
Loading

0 comments on commit 83c2012

Please sign in to comment.