Skip to content
bill-baumgartner edited this page Mar 21, 2019 · 16 revisions

The Colorado Richly Annotated Full-Text Corpus

Important note about character encoding

UTF-8 encoding is used throughout the CRAFT project, so please default to UTF-8 when using CRAFT resources. If you use an encoding other than UTF-8, the character offsets in the stand-off annotation files will not align with the expected document text. Annotation offsets included in this distribution are relative to the plain-text versions of the articles available in the articles/txt/ directory.

Getting Started

Installation Instructions
Understanding the CRAFT distribution directory structure
Creating alternative annotation file formats

Important notes

Dependency parse derivation from treebank data

Extending/Visualizing CRAFT annotations

Starting your own annotation project using Knowtator2
Visualizing CRAFT annotations

References

Primary references for the CRAFT corpus

Feedback

Please direct comments, questions, and suggestions to the Issues section of the CRAFT GitHub page, or send e-mail to Mike Bada at [email protected].