Releases: UCDenver-ccp/file-conversion
Releases · UCDenver-ccp/file-conversion
v0.3.1
v0.2.2
Changes include:
- Updated all Document Readers to validate annotations as they are imported. Two types of validation are implemented. 1) Discontinuous spans are validated in two ways. One, if a discontinuous span contains adjacent component spans, e.g. [35..43][44..52], or component spans are are separated by only whitespace, then the component spans are combined, e.g. [35..52]. Second, if the discontinuous span contains a component span that is nested in another component span, e.g. [78..92][88..92], then the nested span is removed, e.g. [78..92]. And 2), coreference identity chain annotations are checked for redundant annotations in a single chain, and for annotations that are members of multiple chains. In all cases, an Exception is thrown by the validation with an appropriate error message so that the issues can be easily addressed.
- Revised the CoNLLCoref Document Writer to exclude two annotation types that are included in the CRAFT coreference annotations, but that should not be included in the CoNLL-Coref 2011/12 file format, namely 'nonreferential pronoun' and 'partonymy relation'.
- Added discontinuous span validation for the CoNLLCorefDocumentWriter. Mapping spans to token boundaries can cause instances of nested discontinuous spans, so the validation code for discontinuous spans was added to the CoNLL-Coref document writer. There was a case in 16628246.xml (coreference annotations) where "7.5 dbc embryos" was annotated as "7" .. "5 dbc embryos". In this case the "7" maps to the "7.5" token and the "5" also maps to the "7.5" token, so the final annotation had two instances of the "7.5" token span. Seems like the original annotation might be faulty, i.e. the "7" .. "5" split, but that's the way it is, so a fix was required.
v0.2.1
Changes in this release include the following:
- Revised treebank-to-dependency conversion to output CoNLL-X format
- Instead of producing incorrect CoNLL-U files, the conversion process now produces CoNLL-X formatted files.
- Added dependencies required for JDK versions >= 9.0
- When using JDKs >= 9.0, some dependencies that were previously included must now be added explicitly, e.g. the JAXB dependencies used in this project.
- Removed some unit tests with hard-coded paths that were mistakenly committed previously
release v0.2
Added generation of an uber jar to avoid issues with http repositories and usage with Clojure Boot.
release of file-conversion v0.1
Updated datasource dependency to 0.7.1