Skip to content

Releases: UCDenver-ccp/file-conversion

v0.3.1

25 Jul 22:23
f9b6b28
Compare
Choose a tag to compare
Merge pull request #8 from UCDenver-ccp/dependabot/maven/com.google.c…

v0.2.2

03 Jul 23:01
Compare
Choose a tag to compare

Changes include:

  • Updated all Document Readers to validate annotations as they are imported. Two types of validation are implemented. 1) Discontinuous spans are validated in two ways. One, if a discontinuous span contains adjacent component spans, e.g. [35..43][44..52], or component spans are are separated by only whitespace, then the component spans are combined, e.g. [35..52]. Second, if the discontinuous span contains a component span that is nested in another component span, e.g. [78..92][88..92], then the nested span is removed, e.g. [78..92]. And 2), coreference identity chain annotations are checked for redundant annotations in a single chain, and for annotations that are members of multiple chains. In all cases, an Exception is thrown by the validation with an appropriate error message so that the issues can be easily addressed.
  • Revised the CoNLLCoref Document Writer to exclude two annotation types that are included in the CRAFT coreference annotations, but that should not be included in the CoNLL-Coref 2011/12 file format, namely 'nonreferential pronoun' and 'partonymy relation'.
  • Added discontinuous span validation for the CoNLLCorefDocumentWriter. Mapping spans to token boundaries can cause instances of nested discontinuous spans, so the validation code for discontinuous spans was added to the CoNLL-Coref document writer. There was a case in 16628246.xml (coreference annotations) where "7.5 dbc embryos" was annotated as "7" .. "5 dbc embryos". In this case the "7" maps to the "7.5" token and the "5" also maps to the "7.5" token, so the final annotation had two instances of the "7.5" token span. Seems like the original annotation might be faulty, i.e. the "7" .. "5" split, but that's the way it is, so a fix was required.

v0.2.1

16 May 00:25
Compare
Choose a tag to compare

Changes in this release include the following:

  • Revised treebank-to-dependency conversion to output CoNLL-X format
    • Instead of producing incorrect CoNLL-U files, the conversion process now produces CoNLL-X formatted files.
  • Added dependencies required for JDK versions >= 9.0
    • When using JDKs >= 9.0, some dependencies that were previously included must now be added explicitly, e.g. the JAXB dependencies used in this project.
  • Removed some unit tests with hard-coded paths that were mistakenly committed previously

release v0.2

08 Apr 19:07
Compare
Choose a tag to compare

Added generation of an uber jar to avoid issues with http repositories and usage with Clojure Boot.

release of file-conversion v0.1

18 Mar 22:43
Compare
Choose a tag to compare
Updated datasource dependency to 0.7.1