.
-
-----
-
- Apache License
- Version 2.0, January 2004
- http://www.apache.org/licenses/
-
- TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-
- 1. Definitions.
-
- "License" shall mean the terms and conditions for use, reproduction,
- and distribution as defined by Sections 1 through 9 of this document.
-
- "Licensor" shall mean the copyright owner or entity authorized by
- the copyright owner that is granting the License.
-
- "Legal Entity" shall mean the union of the acting entity and all
- other entities that control, are controlled by, or are under common
- control with that entity. For the purposes of this definition,
- "control" means (i) the power, direct or indirect, to cause the
- direction or management of such entity, whether by contract or
- otherwise, or (ii) ownership of fifty percent (50%) or more of the
- outstanding shares, or (iii) beneficial ownership of such entity.
-
- "You" (or "Your") shall mean an individual or Legal Entity
- exercising permissions granted by this License.
-
- "Source" form shall mean the preferred form for making modifications,
- including but not limited to software source code, documentation
- source, and configuration files.
-
- "Object" form shall mean any form resulting from mechanical
- transformation or translation of a Source form, including but
- not limited to compiled object code, generated documentation,
- and conversions to other media types.
-
- "Work" shall mean the work of authorship, whether in Source or
- Object form, made available under the License, as indicated by a
- copyright notice that is included in or attached to the work
- (an example is provided in the Appendix below).
-
- "Derivative Works" shall mean any work, whether in Source or Object
- form, that is based on (or derived from) the Work and for which the
- editorial revisions, annotations, elaborations, or other modifications
- represent, as a whole, an original work of authorship. For the purposes
- of this License, Derivative Works shall not include works that remain
- separable from, or merely link (or bind by name) to the interfaces of,
- the Work and Derivative Works thereof.
-
- "Contribution" shall mean any work of authorship, including
- the original version of the Work and any modifications or additions
- to that Work or Derivative Works thereof, that is intentionally
- submitted to Licensor for inclusion in the Work by the copyright owner
- or by an individual or Legal Entity authorized to submit on behalf of
- the copyright owner. For the purposes of this definition, "submitted"
- means any form of electronic, verbal, or written communication sent
- to the Licensor or its representatives, including but not limited to
- communication on electronic mailing lists, source code control systems,
- and issue tracking systems that are managed by, or on behalf of, the
- Licensor for the purpose of discussing and improving the Work, but
- excluding communication that is conspicuously marked or otherwise
- designated in writing by the copyright owner as "Not a Contribution."
-
- "Contributor" shall mean Licensor and any individual or Legal Entity
- on behalf of whom a Contribution has been received by Licensor and
- subsequently incorporated within the Work.
-
- 2. Grant of Copyright License. Subject to the terms and conditions of
- this License, each Contributor hereby grants to You a perpetual,
- worldwide, non-exclusive, no-charge, royalty-free, irrevocable
- copyright license to reproduce, prepare Derivative Works of,
- publicly display, publicly perform, sublicense, and distribute the
- Work and such Derivative Works in Source or Object form.
-
- 3. Grant of Patent License. Subject to the terms and conditions of
- this License, each Contributor hereby grants to You a perpetual,
- worldwide, non-exclusive, no-charge, royalty-free, irrevocable
- (except as stated in this section) patent license to make, have made,
- use, offer to sell, sell, import, and otherwise transfer the Work,
- where such license applies only to those patent claims licensable
- by such Contributor that are necessarily infringed by their
- Contribution(s) alone or by combination of their Contribution(s)
- with the Work to which such Contribution(s) was submitted. If You
- institute patent litigation against any entity (including a
- cross-claim or counterclaim in a lawsuit) alleging that the Work
- or a Contribution incorporated within the Work constitutes direct
- or contributory patent infringement, then any patent licenses
- granted to You under this License for that Work shall terminate
- as of the date such litigation is filed.
-
- 4. Redistribution. You may reproduce and distribute copies of the
- Work or Derivative Works thereof in any medium, with or without
- modifications, and in Source or Object form, provided that You
- meet the following conditions:
-
- (a) You must give any other recipients of the Work or
- Derivative Works a copy of this License; and
-
- (b) You must cause any modified files to carry prominent notices
- stating that You changed the files; and
-
- (c) You must retain, in the Source form of any Derivative Works
- that You distribute, all copyright, patent, trademark, and
- attribution notices from the Source form of the Work,
- excluding those notices that do not pertain to any part of
- the Derivative Works; and
-
- (d) If the Work includes a "NOTICE" text file as part of its
- distribution, then any Derivative Works that You distribute must
- include a readable copy of the attribution notices contained
- within such NOTICE file, excluding those notices that do not
- pertain to any part of the Derivative Works, in at least one
- of the following places: within a NOTICE text file distributed
- as part of the Derivative Works; within the Source form or
- documentation, if provided along with the Derivative Works; or,
- within a display generated by the Derivative Works, if and
- wherever such third-party notices normally appear. The contents
- of the NOTICE file are for informational purposes only and
- do not modify the License. You may add Your own attribution
- notices within Derivative Works that You distribute, alongside
- or as an addendum to the NOTICE text from the Work, provided
- that such additional attribution notices cannot be construed
- as modifying the License.
-
- You may add Your own copyright statement to Your modifications and
- may provide additional or different license terms and conditions
- for use, reproduction, or distribution of Your modifications, or
- for any such Derivative Works as a whole, provided Your use,
- reproduction, and distribution of the Work otherwise complies with
- the conditions stated in this License.
-
- 5. Submission of Contributions. Unless You explicitly state otherwise,
- any Contribution intentionally submitted for inclusion in the Work
- by You to the Licensor shall be under the terms and conditions of
- this License, without any additional terms or conditions.
- Notwithstanding the above, nothing herein shall supersede or modify
- the terms of any separate license agreement you may have executed
- with Licensor regarding such Contributions.
-
- 6. Trademarks. This License does not grant permission to use the trade
- names, trademarks, service marks, or product names of the Licensor,
- except as required for reasonable and customary use in describing the
- origin of the Work and reproducing the content of the NOTICE file.
-
- 7. Disclaimer of Warranty. Unless required by applicable law or
- agreed to in writing, Licensor provides the Work (and each
- Contributor provides its Contributions) on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
- implied, including, without limitation, any warranties or conditions
- of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
- PARTICULAR PURPOSE. You are solely responsible for determining the
- appropriateness of using or redistributing the Work and assume any
- risks associated with Your exercise of permissions under this License.
-
- 8. Limitation of Liability. In no event and under no legal theory,
- whether in tort (including negligence), contract, or otherwise,
- unless required by applicable law (such as deliberate and grossly
- negligent acts) or agreed to in writing, shall any Contributor be
- liable to You for damages, including any direct, indirect, special,
- incidental, or consequential damages of any character arising as a
- result of this License or out of the use or inability to use the
- Work (including but not limited to damages for loss of goodwill,
- work stoppage, computer failure or malfunction, or any and all
- other commercial damages or losses), even if such Contributor
- has been advised of the possibility of such damages.
-
- 9. Accepting Warranty or Additional Liability. While redistributing
- the Work or Derivative Works thereof, You may choose to offer,
- and charge a fee for, acceptance of support, warranty, indemnity,
- or other liability obligations and/or rights consistent with this
- License. However, in accepting such obligations, You may act only
- on Your own behalf and on Your sole responsibility, not on behalf
- of any other Contributor, and only if You agree to indemnify,
- defend, and hold each Contributor harmless for any liability
- incurred by, or claims asserted against, such Contributor by reason
- of your accepting any such warranty or additional liability.
-
- END OF TERMS AND CONDITIONS
-
- APPENDIX: How to apply the Apache License to your work.
-
- To apply the Apache License to your work, attach the following
- boilerplate notice, with the fields enclosed by brackets "[]"
- replaced with your own identifying information. (Don't include
- the brackets!) The text should be enclosed in the appropriate
- comment syntax for the file format. We also recommend that a
- file or class name and description of purpose be included on the
- same "printed page" as the copyright notice for easier
- identification within third-party archives.
-
- Copyright [yyyy] [name of copyright owner]
-
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
-
diff --git a/dkpro-core-berkeleyparser-gpl/pom.xml b/dkpro-core-berkeleyparser-gpl/pom.xml
deleted file mode 100644
index a82fdee941..0000000000
--- a/dkpro-core-berkeleyparser-gpl/pom.xml
+++ /dev/null
@@ -1,201 +0,0 @@
-
-
- 4.0.0
-
-
- org.dkpro.core
- dkpro-core-gpl
- 3.0.0-SNAPSHOT
- ../dkpro-core-gpl
-
-
- dkpro-core-berkeleyparser-gpl
- jar
- DKPro Core GPL - Berkeley Parser
- https://dkpro.github.io/dkpro-core/
-
-
-
- org.apache.uima
- uimaj-core
-
-
- org.apache.uima
- uimafit-core
-
-
- org.apache.commons
- commons-lang3
-
-
- edu.berkeley.nlp
- berkeleyparser
- r32
-
-
- org.dkpro.core
- dkpro-core-api-metadata-asl
- ${project.version}
-
-
- org.dkpro.core
- dkpro-core-api-resources-asl
- ${project.version}
-
-
- org.dkpro.core
- dkpro-core-api-lexmorph-asl
- ${project.version}
-
-
- org.dkpro.core
- dkpro-core-api-syntax-asl
- ${project.version}
-
-
- org.dkpro.core
- dkpro-core-api-segmentation-asl
- ${project.version}
-
-
- org.dkpro.core
- dkpro-core-api-parameter-asl
- ${project.version}
-
-
- eu.openminted.share.annotations
- omtd-share-annotations-api
-
-
- org.dkpro.core
- dkpro-core-testing-asl
- ${project.version}
- test
-
-
- org.dkpro.core
- dkpro-core-opennlp-asl
- ${project.version}
- test
-
-
- de.tudarmstadt.ukp.dkpro.core
- de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-ar-sm5
- test
-
-
- de.tudarmstadt.ukp.dkpro.core
- de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-bg-sm5
- test
-
-
- de.tudarmstadt.ukp.dkpro.core
- de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-de-sm5
- test
-
-
- de.tudarmstadt.ukp.dkpro.core
- de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-en-sm6
- test
-
-
- de.tudarmstadt.ukp.dkpro.core
- de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-fr-sm5
- test
-
-
- de.tudarmstadt.ukp.dkpro.core
- de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-zh-sm5
- test
-
-
- de.tudarmstadt.ukp.dkpro.core
- de.tudarmstadt.ukp.dkpro.core.opennlp-model-tagger-en-maxent
- test
-
-
-
-
-
-
- de.tudarmstadt.ukp.dkpro.core
- de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-ar-sm5
- 20090917.1
-
-
- de.tudarmstadt.ukp.dkpro.core
- de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-bg-sm5
- 20090917.1
-
-
- de.tudarmstadt.ukp.dkpro.core
- de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-de-sm5
- 20090917.1
-
-
- de.tudarmstadt.ukp.dkpro.core
- de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-en-sm6
- 20100819.1
-
-
- de.tudarmstadt.ukp.dkpro.core
- de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-fr-sm5
- 20090917.1
-
-
- de.tudarmstadt.ukp.dkpro.core
- de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-zh-sm5
- 20090917.1
-
-
- org.dkpro.core
- dkpro-core-opennlp-asl
- ${project.version}
- pom
- import
-
-
-
-
-
-
-
-
- org.apache.maven.plugins
- maven-dependency-plugin
-
-
-
- de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-ar-sm5
- de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-bg-sm5
- de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-de-sm5
- de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-en-sm6
- de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-fr-sm5
- de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.berkeleyparser-model-parser-zh-sm5
- de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.opennlp-model-tagger-en-maxent
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/dkpro-core-berkeleyparser-gpl/src/main/java/org/dkpro/core/berkeleyparser/BerkeleyParser.java b/dkpro-core-berkeleyparser-gpl/src/main/java/org/dkpro/core/berkeleyparser/BerkeleyParser.java
deleted file mode 100644
index 4dc844b0e6..0000000000
--- a/dkpro-core-berkeleyparser-gpl/src/main/java/org/dkpro/core/berkeleyparser/BerkeleyParser.java
+++ /dev/null
@@ -1,442 +0,0 @@
-/*
- * Copyright 2007-2024
- * Ubiquitous Knowledge Processing (UKP) Lab
- * Technische Universität Darmstadt
- *
- * This program is free software: you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see http://www.gnu.org/licenses/.
- */
-package org.dkpro.core.berkeleyparser;
-
-import static org.apache.uima.fit.util.JCasUtil.select;
-import static org.apache.uima.fit.util.JCasUtil.selectCovered;
-import static org.apache.uima.util.Level.INFO;
-import static org.dkpro.core.api.parameter.ComponentParameters.DEFAULT_MAPPING_ENABLED;
-import static org.dkpro.core.api.resources.MappingProviderFactory.createConstituentMappingProvider;
-import static org.dkpro.core.api.resources.MappingProviderFactory.createPosMappingProvider;
-
-import java.io.IOException;
-import java.io.ObjectInputStream;
-import java.net.URL;
-import java.util.ArrayList;
-import java.util.List;
-import java.util.Properties;
-import java.util.stream.Collectors;
-import java.util.zip.GZIPInputStream;
-
-import org.apache.commons.lang3.mutable.MutableInt;
-import org.apache.uima.UimaContext;
-import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
-import org.apache.uima.cas.CAS;
-import org.apache.uima.cas.Type;
-import org.apache.uima.fit.component.JCasAnnotator_ImplBase;
-import org.apache.uima.fit.descriptor.ConfigurationParameter;
-import org.apache.uima.fit.descriptor.OperationalProperties;
-import org.apache.uima.fit.descriptor.ResourceMetaData;
-import org.apache.uima.fit.descriptor.TypeCapability;
-import org.apache.uima.fit.util.FSCollectionFactory;
-import org.apache.uima.jcas.JCas;
-import org.apache.uima.jcas.cas.FSArray;
-import org.apache.uima.jcas.tcas.Annotation;
-import org.apache.uima.resource.ResourceInitializationException;
-import org.dkpro.core.api.lexmorph.pos.POSUtils;
-import org.dkpro.core.api.metadata.SingletonTagset;
-import org.dkpro.core.api.parameter.ComponentParameters;
-import org.dkpro.core.api.resources.CasConfigurableProviderBase;
-import org.dkpro.core.api.resources.MappingProvider;
-import org.dkpro.core.api.resources.ModelProviderBase;
-
-import de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS;
-import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence;
-import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token;
-import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.PennTree;
-import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.constituent.Constituent;
-import edu.berkeley.nlp.PCFGLA.CoarseToFineMaxRuleParser;
-import edu.berkeley.nlp.PCFGLA.Grammar;
-import edu.berkeley.nlp.PCFGLA.Lexicon;
-import edu.berkeley.nlp.PCFGLA.ParserData;
-import edu.berkeley.nlp.PCFGLA.TreeAnnotations;
-import edu.berkeley.nlp.syntax.Tree;
-import edu.berkeley.nlp.util.Numberer;
-import eu.openminted.share.annotations.api.Component;
-import eu.openminted.share.annotations.api.DocumentationResource;
-import eu.openminted.share.annotations.api.constants.OperationType;
-
-/**
- * Berkeley Parser annotator. Requires {@link Sentence}s to be annotated before.
- *
- * @see CoarseToFineMaxRuleParser
- */
-@Component(OperationType.CONSTITUENCY_PARSER)
-@ResourceMetaData(name = "Berkeley Parser")
-@DocumentationResource("${docbase}/component-reference.html#engine-${shortClassName}")
-@OperationalProperties(multipleDeploymentAllowed = false)
-@TypeCapability(inputs = { "de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token",
- "de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence" }, outputs = {
- "de.tudarmstadt.ukp.dkpro.core.api.syntax.type.constituent.Constituent",
- "de.tudarmstadt.ukp.dkpro.core.api.syntax.type.PennTree" })
-public class BerkeleyParser
- extends JCasAnnotator_ImplBase
-{
- /**
- * Use this language instead of the language set in the CAS to locate the model.
- */
- public static final String PARAM_LANGUAGE = ComponentParameters.PARAM_LANGUAGE;
- @ConfigurationParameter(name = PARAM_LANGUAGE, mandatory = false)
- protected String language;
-
- /**
- * Override the default variant used to locate the model.
- */
- public static final String PARAM_VARIANT = ComponentParameters.PARAM_VARIANT;
- @ConfigurationParameter(name = PARAM_VARIANT, mandatory = false)
- protected String variant;
-
- /**
- * URI of the model artifact. This can be used to override the default model resolving mechanism
- * and directly address a particular model.
- *
- *
- * The URI format is {@code mvn:${groupId}:${artifactId}:${version}}. Remember to set the
- * variant parameter to match the artifact. If the artifact contains the model in a non-default
- * location, you also have to specify the model location parameter, e.g.
- * {@code classpath:/model/path/in/artifact/model.bin}.
- *
- */
- public static final String PARAM_MODEL_ARTIFACT_URI = ComponentParameters.PARAM_MODEL_ARTIFACT_URI;
- @ConfigurationParameter(name = PARAM_MODEL_ARTIFACT_URI, mandatory = false)
- protected String modelArtifactUri;
-
- /**
- * Load the model from this location instead of locating the model automatically.
- */
- public static final String PARAM_MODEL_LOCATION = ComponentParameters.PARAM_MODEL_LOCATION;
- @ConfigurationParameter(name = PARAM_MODEL_LOCATION, mandatory = false)
- protected String modelLocation;
-
- /**
- * Enable/disable type mapping.
- */
- public static final String PARAM_MAPPING_ENABLED = ComponentParameters.PARAM_MAPPING_ENABLED;
- @ConfigurationParameter(name = PARAM_MAPPING_ENABLED, defaultValue = DEFAULT_MAPPING_ENABLED)
- protected boolean mappingEnabled;
-
- /**
- * Location of the mapping file for part-of-speech tags to UIMA types.
- */
- public static final String PARAM_POS_MAPPING_LOCATION = ComponentParameters.PARAM_POS_MAPPING_LOCATION;
- @ConfigurationParameter(name = PARAM_POS_MAPPING_LOCATION, mandatory = false)
- protected String posMappingLocation;
-
- /**
- * Location of the mapping file for constituent tags to UIMA types.
- */
- public static final String PARAM_CONSTITUENT_MAPPING_LOCATION = //
- ComponentParameters.PARAM_CONSTITUENT_MAPPING_LOCATION;
- @ConfigurationParameter(name = PARAM_CONSTITUENT_MAPPING_LOCATION, mandatory = false)
- protected String constituentMappingLocation;
-
- /**
- * Log the tag set(s) when a model is loaded.
- */
- public static final String PARAM_PRINT_TAGSET = ComponentParameters.PARAM_PRINT_TAGSET;
- @ConfigurationParameter(name = PARAM_PRINT_TAGSET, mandatory = true, defaultValue = "false")
- protected boolean printTagSet;
-
- /**
- * Sets whether to use or not to use already existing POS tags from another annotator for the
- * parsing process.
- */
- public static final String PARAM_READ_POS = ComponentParameters.PARAM_READ_POS;
- @ConfigurationParameter(name = PARAM_READ_POS, mandatory = true, defaultValue = "true")
- private boolean readPos;
-
- /**
- * Sets whether to create or not to create POS tags. The creation of constituent tags must be
- * turned on for this to work.
- */
- public static final String PARAM_WRITE_POS = ComponentParameters.PARAM_WRITE_POS;
- @ConfigurationParameter(name = PARAM_WRITE_POS, mandatory = true, defaultValue = "false")
- private boolean writePos;
-
- /**
- * If this parameter is set to true, each sentence is annotated with a PennTree-Annotation,
- * containing the whole parse tree in Penn Treebank style format.
- */
- public static final String PARAM_WRITE_PENN_TREE = ComponentParameters.PARAM_WRITE_PENN_TREE;
- @ConfigurationParameter(name = PARAM_WRITE_PENN_TREE, mandatory = true, defaultValue = "false")
- private boolean writePennTree;
-
- /**
- * Compute Viterbi derivation instead of max-rule tree.
- */
- public static final String PARAM_VITERBI = "viterbi";
- @ConfigurationParameter(name = PARAM_VITERBI, mandatory = true, defaultValue = "false")
- private boolean viterbi;
-
- /**
- * Output sub-categories (only for binarized Viterbi trees).
- */
- public static final String PARAM_SUBSTATES = "substates";
- @ConfigurationParameter(name = PARAM_SUBSTATES, mandatory = true, defaultValue = "false")
- private boolean substates;
-
- /**
- * Output inside scores (only for binarized viterbi trees).
- */
- public static final String PARAM_SCORES = "scores";
- @ConfigurationParameter(name = PARAM_SCORES, mandatory = true, defaultValue = "false")
- private boolean scores;
-
- /**
- * Set thresholds for accuracy instead of efficiency.
- */
- public static final String PARAM_ACCURATE = "accurate";
- @ConfigurationParameter(name = PARAM_ACCURATE, mandatory = true, defaultValue = "false")
- private boolean accurate;
-
- /**
- * Use variational rule score approximation instead of max-rule
- */
- public static final String PARAM_VARIATIONAL = "variational";
- @ConfigurationParameter(name = PARAM_VARIATIONAL, mandatory = true, defaultValue = "false")
- private boolean variational;
-
- /**
- * Retain predicted function labels. Model must have been trained with function labels.
- */
- public static final String PARAM_KEEP_FUNCTION_LABELS = "keepFunctionLabels";
- @ConfigurationParameter(name = PARAM_KEEP_FUNCTION_LABELS, mandatory = true, defaultValue = "false")
- private boolean keepFunctionLabels;
-
- /**
- * Output binarized trees.
- */
- public static final String PARAM_BINARIZE = "binarize";
- @ConfigurationParameter(name = PARAM_BINARIZE, mandatory = true, defaultValue = "false")
- private boolean binarize;
-
- private CasConfigurableProviderBase modelProvider;
- private MappingProvider posMappingProvider;
- private MappingProvider constituentMappingProvider;
-
- @Override
- public void initialize(UimaContext aContext) throws ResourceInitializationException
- {
- super.initialize(aContext);
-
- modelProvider = new BerkeleyParserModelProvider();
-
- if (writePos) {
- posMappingProvider = createPosMappingProvider(this, posMappingLocation, language,
- modelProvider);
- }
-
- constituentMappingProvider = createConstituentMappingProvider(this,
- constituentMappingLocation, language, modelProvider);
- }
-
- @Override
- public void process(JCas aJCas) throws AnalysisEngineProcessException
- {
- CAS cas = aJCas.getCas();
-
- modelProvider.configure(cas);
- if (writePos) {
- posMappingProvider.configure(cas);
- }
- constituentMappingProvider.configure(cas);
-
- for (Sentence sentence : select(aJCas, Sentence.class)) {
- List tokens = selectCovered(aJCas, Token.class, sentence);
- List tokenText = tokens.stream().map(t -> t.getText())
- .collect(Collectors.toList());
-
- List posTags = null;
- if (readPos) {
- posTags = new ArrayList(tokens.size());
- for (Token t : tokens) {
- posTags.add(t.getPos().getPosValue());
- }
- }
-
- Tree parseOutput = modelProvider.getResource()
- .getBestConstrainedParse(tokenText, posTags, false);
-
- // Check if the sentence could be parsed or not
- if (parseOutput.getChildren().isEmpty()) {
- getLogger().warn("Unable to parse sentence: [" + sentence.getCoveredText() + "]");
- continue;
- }
-
- if (!binarize) {
- parseOutput = TreeAnnotations.unAnnotateTree(parseOutput, keepFunctionLabels);
- }
-
- createConstituentAnnotationFromTree(aJCas, parseOutput, null, tokens,
- new MutableInt(0));
-
- if (writePennTree) {
- PennTree pTree = new PennTree(aJCas, sentence.getBegin(), sentence.getEnd());
- pTree.setPennTree(parseOutput.toString());
- pTree.addToIndexes();
- }
- }
- }
-
- /**
- * Creates linked constituent annotations + POS annotations
- *
- * @param aNode
- * the source tree
- * @return the child-structure (needed for recursive call only)
- */
- private Annotation createConstituentAnnotationFromTree(JCas aJCas, Tree aNode,
- Annotation aParentFS, List aTokens, MutableInt aIndex)
- {
- // If the node is a word-level constituent node (== POS):
- // create parent link on token and (if not turned off) create POS tag
- if (aNode.isPreTerminal()) {
- Token token = aTokens.get(aIndex.intValue());
-
- // link token to its parent constituent
- if (aParentFS != null) {
- token.setParent(aParentFS);
- }
-
- // only add POS to index if we want POS-tagging
- if (writePos) {
- String typeName = aNode.getLabel();
- Type posTag = posMappingProvider.getTagType(typeName);
- POS posAnno = (POS) aJCas.getCas().createAnnotation(posTag, token.getBegin(),
- token.getEnd());
- posAnno.setPosValue(typeName != null ? typeName.intern() : null);
- POSUtils.assignCoarseValue(posAnno);
- posAnno.addToIndexes();
- token.setPos(posAnno);
- }
-
- aIndex.add(1);
-
- return token;
- }
- // Check if node is a constituent node on sentence or phrase-level
- else {
- String typeName = aNode.getLabel();
-
- // create the necessary objects and methods
- Type constType = constituentMappingProvider.getTagType(typeName);
-
- Constituent constAnno = (Constituent) aJCas.getCas().createAnnotation(constType, 0, 0);
- constAnno.setConstituentType(typeName);
-
- // link to parent
- if (aParentFS != null) {
- constAnno.setParent(aParentFS);
- }
-
- // Do we have any children?
- List childAnnotations = new ArrayList();
- for (Tree child : aNode.getChildren()) {
- Annotation childAnnotation = createConstituentAnnotationFromTree(aJCas, child,
- constAnno, aTokens, aIndex);
- if (childAnnotation != null) {
- childAnnotations.add(childAnnotation);
- }
- }
-
- constAnno.setBegin(childAnnotations.get(0).getBegin());
- constAnno.setEnd(childAnnotations.get(childAnnotations.size() - 1).getEnd());
-
- // Now that we know how many children we have, link annotation of
- // current node with its children
- FSArray childArray = FSCollectionFactory.createFSArray(aJCas, childAnnotations);
- constAnno.setChildren(childArray);
-
- // write annotation for current node to index
- aJCas.addFsToIndexes(constAnno);
-
- return constAnno;
- }
- }
-
- private class BerkeleyParserModelProvider
- extends ModelProviderBase
- {
- {
- setContextObject(BerkeleyParser.this);
-
- setDefault(GROUP_ID, "de.tudarmstadt.ukp.dkpro.core");
- setDefault(ARTIFACT_ID,
- "${groupId}.berkeleyparser-model-parser-${language}-${variant}");
- setDefault(LOCATION,
- "classpath:/de/tudarmstadt/ukp/dkpro/core/berkeleyparser/lib/parser-${language}-${variant}.bin");
- setDefaultVariantsLocation("${package}/lib/parser-default-variants.map");
-
- setOverride(LOCATION, modelLocation);
- setOverride(LANGUAGE, language);
- setOverride(VARIANT, variant);
- }
-
- @Override
- protected CoarseToFineMaxRuleParser produceResource(URL aUrl) throws IOException
- {
- try (ObjectInputStream is = new ObjectInputStream(
- new GZIPInputStream(aUrl.openStream()))) {
- ParserData pData = (ParserData) is.readObject();
-
- Grammar grammar = pData.getGrammar();
- Lexicon lexicon = pData.getLexicon();
- Numberer.setNumberers(pData.getNumbs());
-
- double threshold = 1.0;
-
- Properties metadata = getResourceMetaData();
- SingletonTagset posTags = new SingletonTagset(POS.class,
- metadata.getProperty("pos.tagset"));
- SingletonTagset constTags = new SingletonTagset(Constituent.class,
- metadata.getProperty("constituent.tagset"));
-
- Numberer tagNumberer = (Numberer) pData.getNumbs().get("tags");
- for (int i = 0; i < tagNumberer.size(); i++) {
- String tag = (String) tagNumberer.object(i);
- if (!binarize && tag.startsWith("@")) {
- continue; // Only show aux. binarization tags if it is enabled.
- }
- if (tag.endsWith("^g")) {
- constTags.add(tag.substring(0, tag.length() - 2));
- }
- else if ("ROOT".equals(tag)) {
- constTags.add(tag);
- }
- else {
- posTags.add(tag);
- }
- }
-
- addTagset(posTags, writePos);
- addTagset(constTags);
-
- if (printTagSet) {
- getContext().getLogger().log(INFO, getTagset().toString());
- }
-
- return new CoarseToFineMaxRuleParser(grammar, lexicon, threshold, -1, viterbi,
- substates, scores, accurate, variational, true, true);
- }
- catch (ClassNotFoundException e) {
- throw new IOException(e);
- }
- }
- };
-}
diff --git a/dkpro-core-berkeleyparser-gpl/src/main/java/org/dkpro/core/berkeleyparser/package-info.java b/dkpro-core-berkeleyparser-gpl/src/main/java/org/dkpro/core/berkeleyparser/package-info.java
deleted file mode 100644
index c6de85fcdb..0000000000
--- a/dkpro-core-berkeleyparser-gpl/src/main/java/org/dkpro/core/berkeleyparser/package-info.java
+++ /dev/null
@@ -1,24 +0,0 @@
-/*
- * Copyright 2007-2024
- * Ubiquitous Knowledge Processing (UKP) Lab
- * Technische Universität Darmstadt
- *
- * This program is free software: you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see http://www.gnu.org/licenses/.
- */
-/**
- * Integration of the Berkeley Parser.
- *
- * @since 1.5.0
- */
-package org.dkpro.core.berkeleyparser;
diff --git a/dkpro-core-berkeleyparser-gpl/src/main/resources/org/dkpro/core/berkeleyparser/lib/parser-default-variants.map b/dkpro-core-berkeleyparser-gpl/src/main/resources/org/dkpro/core/berkeleyparser/lib/parser-default-variants.map
deleted file mode 100644
index cec5d34ba1..0000000000
--- a/dkpro-core-berkeleyparser-gpl/src/main/resources/org/dkpro/core/berkeleyparser/lib/parser-default-variants.map
+++ /dev/null
@@ -1,6 +0,0 @@
-ar=sm5
-bg=sm5
-de=sm5
-en=sm6
-fr=sm5
-zh=sm5
\ No newline at end of file
diff --git a/dkpro-core-berkeleyparser-gpl/src/scripts/build.xml b/dkpro-core-berkeleyparser-gpl/src/scripts/build.xml
deleted file mode 100644
index 3e84cfb6a3..0000000000
--- a/dkpro-core-berkeleyparser-gpl/src/scripts/build.xml
+++ /dev/null
@@ -1,211 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/dkpro-core-berkeleyparser-gpl/src/test/java/org/dkpro/core/berkeleyparser/BerkeleyParserTest.java b/dkpro-core-berkeleyparser-gpl/src/test/java/org/dkpro/core/berkeleyparser/BerkeleyParserTest.java
deleted file mode 100644
index 5171a35fec..0000000000
--- a/dkpro-core-berkeleyparser-gpl/src/test/java/org/dkpro/core/berkeleyparser/BerkeleyParserTest.java
+++ /dev/null
@@ -1,455 +0,0 @@
-/*
- * Copyright 2007-2024
- * Ubiquitous Knowledge Processing (UKP) Lab
- * Technische Universität Darmstadt
- *
- * This program is free software: you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see http://www.gnu.org/licenses/.
- */
-package org.dkpro.core.berkeleyparser;
-
-import static org.apache.uima.fit.factory.AnalysisEngineFactory.createEngineDescription;
-import static org.apache.uima.fit.util.JCasUtil.select;
-import static org.apache.uima.fit.util.JCasUtil.selectSingle;
-import static org.dkpro.core.testing.AssertAnnotations.assertConstituents;
-import static org.dkpro.core.testing.AssertAnnotations.assertPOS;
-import static org.dkpro.core.testing.AssertAnnotations.assertPennTree;
-import static org.dkpro.core.testing.AssertAnnotations.assertTagset;
-import static org.dkpro.core.testing.AssertAnnotations.assertTagsetMapping;
-
-import java.util.ArrayList;
-import java.util.List;
-
-import org.apache.commons.lang3.ArrayUtils;
-import org.apache.uima.fit.factory.AggregateBuilder;
-import org.apache.uima.jcas.JCas;
-import org.dkpro.core.opennlp.OpenNlpPosTagger;
-import org.dkpro.core.testing.AssertAnnotations;
-import org.dkpro.core.testing.TestRunner;
-import org.junit.jupiter.api.Test;
-
-import de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS;
-import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.PennTree;
-import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.constituent.Constituent;
-import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency;
-
-public class BerkeleyParserTest
-{
- static final String documentEnglish = "We need a very complicated example sentence , which " +
- "contains as many constituents and dependencies as possible .";
-
- @Test
- public void testArabic()
- throws Exception
- {
- JCas jcas = runTest("ar",
- "نحتاج مثالا معقدا جدا ل جملة تحتوي على أكبر قدر ممكن من العناصر و الروابط .");
-
- String[] constituentMapped = { "ROOT 0,75", "X 0,75" };
-
- String[] constituentOriginal = { "ROOT 0,75", "X 0,75" };
-
- String[] dependencies = {};
-
- String pennTree = "(ROOT (ROOT (X (PUNC نحتاج) (PUNC مثالا) (NN معقدا) (NN جدا) (NN ل) (NN جملة) "
- + "(NN تحتوي) (NN على) (NN أكبر) (NN قدر) (NN ممكن) (NN من) (NN العناصر) (NN و) (NN الروابط) "
- + "(PUNC .))))";
-
- String[] posMapped = { "POS_PUNCT", "POS_PUNCT", "POS_NOUN", "POS_NOUN", "POS_NOUN",
- "POS_NOUN", "POS_NOUN", "POS_NOUN", "POS_NOUN", "POS_NOUN", "POS_NOUN", "POS_NOUN",
- "POS_NOUN", "POS_NOUN", "POS_NOUN", "POS_PUNCT" };
-
- String[] posOriginal = { "PUNC", "PUNC", "NN", "NN", "NN", "NN", "NN", "NN", "NN", "NN",
- "NN", "NN", "NN", "NN", "NN", "PUNC" };
-
- String[] posTags = { "CC", "CD", "DEM", "DT", "IN", "JJ", "NN", "NNP", "NNPS", "NNS",
- "NOFUNC", "NUMCOMMA", "PRP", "PRP$", "PUNC", "RB", "RP", "UH", "VB", "VBD", "VBN",
- "VBP", "VERB", "WP", "WRB" };
-
- String[] constituentTags = { "ADJP", "ADVP", "CONJP", "FRAG", "INTJ", "LST", "NAC", "NP",
- "NX", "PP", "PRN", "PRT", "QP", "ROOT", "S", "SBAR", "SBARQ", "SINV", "SQ", "UCP",
- "VP", "WHADJP", "WHADVP", "WHNP", "WHPP", "X" };
-
- String[] unmappedPos = { "DEM", "NOFUNC", "NUMCOMMA", "PRP$", "VERB" };
-
- String[] unmappedConst = { "LST", "SINV" };
-
- AssertAnnotations.assertPOS(posMapped, posOriginal, select(jcas, POS.class));
- AssertAnnotations.assertPennTree(pennTree, selectSingle(jcas, PennTree.class));
- AssertAnnotations.assertConstituents(constituentMapped, constituentOriginal,
- select(jcas, Constituent.class));
- AssertAnnotations.assertDependencies(dependencies, select(jcas, Dependency.class));
- AssertAnnotations.assertTagset(POS.class, "atb", posTags, jcas);
- AssertAnnotations.assertTagsetMapping(POS.class, "atb", unmappedPos, jcas);
- AssertAnnotations.assertTagset(Constituent.class, "atb", constituentTags, jcas);
- AssertAnnotations.assertTagsetMapping(Constituent.class, "atb", unmappedConst, jcas);
- }
-
- @Test
- public void testBulgarian()
- throws Exception
- {
- JCas jcas = runTest("bg", "Имаме нужда от един много сложен пример изречение , " +
- "което съдържа най-много съставки и зависимости, колкото е възможно .");
-
- String[] constituentMapped = { "ROOT 0,120", "X 0,118", "X 0,120", "X 0,5", "X 100,107",
- "X 100,118", "X 108,109", "X 108,118", "X 110,118", "X 12,118", "X 12,14",
- "X 15,118", "X 15,19", "X 15,39", "X 20,25", "X 20,32", "X 20,39", "X 26,32",
- "X 33,39", "X 40,118", "X 40,49", "X 40,84", "X 50,84", "X 52,57", "X 52,84",
- "X 58,65", "X 58,84", "X 6,11", "X 6,118", "X 66,75", "X 66,84", "X 76,84",
- "X 85,86", "X 87,118", "X 87,99" };
-
- String[] constituentOriginal = { "A 26,32", "APA 20,32", "Adv 100,107", "Adv 110,118",
- "Adv 20,25", "Adv 66,75", "Adv 87,99", "AdvPA 87,118", "C 85,86", "CL 100,118",
- "CLR 50,84", "Conj 85,86", "ConjArg 40,84", "ConjArg 87,118", "CoordP 40,118",
- "M 15,19", "N 33,39", "N 40,49", "N 6,11", "N 76,84", "NPA 15,118", "NPA 15,39",
- "NPA 20,39", "NPA 40,84", "NPA 6,118", "NPA 66,84", "PP 12,118", "Prep 12,14",
- "Pron 52,57", "ROOT 0,120", "S 0,120", "V 0,5", "V 108,109", "V 58,65",
- "VPA 100,118", "VPC 0,118", "VPC 108,118", "VPC 58,84", "VPS 52,84" };
-
- String[] posMapped = { "POS", "POS", "POS", "POS", "POS", "POS", "POS", "POS", "POS",
- "POS", "POS", "POS", "POS", "POS", "POS", "POS", "POS", "POS", "POS" };
-
- String[] posOriginal = { "Vpitf", "Ncfsi", "R", "Mcmsi", "Md", "Amsi", "Ncmsi", "Ncnsi",
- "pt", "Pre", "Vpitf", "Md", "Ncmpi", "Cp", "Dm", "Prq", "Vxitf", "Dd", "pt" };
-
- String pennTree = "(ROOT (ROOT (S (VPC (V (Vpitf Имаме)) (NPA (N (Ncfsi нужда)) (PP "
- + "(Prep (R от)) (NPA (NPA (M (Mcmsi един)) (NPA (APA (Adv (Md много)) (A "
- + "(Amsi сложен))) (N (Ncmsi пример)))) (CoordP (ConjArg (NPA (N "
- + "(Ncnsi изречение)) (CLR (pt ,) (VPS (Pron (Pre което)) (VPC (V "
- + "(Vpitf съдържа)) (NPA (Adv (Md най-много)) (N (Ncmpi съставки)))))))) "
- + "(Conj (C (Cp и))) (ConjArg (AdvPA (Adv (Dm зависимости,)) (CL (VPA (Adv "
- + "(Prq колкото)) (VPC (V (Vxitf е)) (Adv (Dd възможно)))))))))))) (pt .))))";
-
- String[] posTags = { "A", "Afsd", "Afsi", "Ams", "Amsf", "Amsh", "Amsi",
- "Ansd", "Ansi", "Cc", "Cp", "Cr", "Cs", "Dd", "Dl", "Dm", "Dq", "Dt", "Hfsi",
- "Hmsf", "I", "Mc", "Mcf", "Mcfpd", "Mcfpi", "Mcfsd", "Mcfsi", "Mcm", "Mcmpd",
- "Mcmpi", "Mcmsf", "Mcmsi", "Mcn", "Mcnpd", "Mcnpi", "Mcnsd", "Mcnsi", "Md", "Mo",
- "Mofsd", "Mofsi", "Momsf", "Momsh", "Momsi", "Monsd", "Monsi", "My", "Nc", "Ncfpd",
- "Ncfpi", "Ncfs", "Ncfsd", "Ncfsi", "Ncmpd", "Ncmpi", "Ncms", "Ncmsd", "Ncmsf",
- "Ncmsh", "Ncmsi", "Ncmt", "Ncnpd", "Ncnpi", "Ncnsd", "Ncnsi", "Npfsi", "Npnsi",
- "Pca", "Pce", "Pcl", "Pcq", "Pct", "Pda", "Pde", "Pdl", "Pdm", "Pdq", "Pds", "Pdt",
- "Pfa", "Pfe", "Pfl", "Pfm", "Pfp", "Pfq", "Pft", "Pfy", "Pia", "Pic", "Pie", "Pil",
- "Pim", "Pip", "Piq", "Pit", "Pna", "Pne", "Pnl", "Pnm", "Pnp", "Pnt", "Ppe",
- "Ppelap1", "Ppelap2", "Ppelap3", "Ppelas1", "Ppelas2", "Ppelas3f", "Ppelas3m",
- "Ppelas3n", "Ppeldp1", "Ppelds1", "Ppelds2", "Ppelds3m", "Ppetap1", "Ppetap2",
- "Ppetap3", "Ppetas1", "Ppetas2", "Ppetas3f", "Ppetas3m", "Ppetas3n", "Ppetdp1",
- "Ppetdp2", "Ppetdp3", "Ppetds1", "Ppetds2", "Ppetds3f", "Ppetds3m", "Ppetds3n",
- "Ppetsp1", "Ppetsp2", "Ppetsp3", "Ppetss1", "Ppetss2", "Ppetss3f", "Ppetss3m",
- "Pph", "Pphlas2", "Pphtas2", "Pphtds2", "Pphtss2", "Ppxta", "Ppxtd", "Ppxts",
- "Pra", "Pre", "Prl", "Prm", "Prp", "Prq", "Prs", "Prt", "Pshl", "Psht", "Psol",
- "Psot", "Psxlop", "Psxlos", "Psxto", "Pszl", "Pszt", "R", "Ta", "Te", "Tg", "Ti",
- "Tm", "Tn", "Tv", "Tx", "Viitf", "Vniicam", "Vniicao", "Vniif", "Vnitcam",
- "Vnitcao", "Vnitf", "Vnpicao", "Vnpif", "Vnptcao", "Vnptf", "Vpiicam", "Vpiicao",
- "Vpiicar", "Vpiif", "Vpiig", "Vpiiz", "Vpitcam", "Vpitcao", "Vpitcar", "Vpitcv",
- "Vpitf", "Vpitg", "Vpitz", "Vppicam", "Vppicao", "Vppif", "Vppiz", "Vpptcam",
- "Vpptcao", "Vpptcv", "Vpptf", "Vpptz", "Vxitcat", "Vxitf", "Vxitu", "Vyptf",
- "Vyptz", "abbr", "foreign", "mw", "name", "pt", "w" };
-
- String[] constituentTags = { "A", "APA", "APC", "Adv", "AdvPA", "AdvPC", "C",
- "CL", "CLCHE", "CLDA", "CLQ", "CLR", "CLZADA", "Conj", "ConjArg", "CoordP",
- "Gerund", "H", "M", "N", "NPA", "NPC", "PP", "Participle", "Prep", "Pron", "ROOT",
- "S", "T", "V", "VPA", "VPC", "VPF", "VPS", "Verbalised" };
-
- String[] unmappedConstituents = { "Conj", "ConjArg", "Verbalised" };
-
- assertPOS(posMapped, posOriginal, select(jcas, POS.class));
- assertPennTree(pennTree, selectSingle(jcas, PennTree.class));
- assertConstituents(constituentMapped, constituentOriginal, select(jcas, Constituent.class));
- assertTagset(POS.class, "btb", posTags, jcas);
- // FIXME assertTagsetMapping(POS.class, "btb", new String[] {}, jcas);
- assertTagset(Constituent.class, "btb", constituentTags, jcas);
- assertTagsetMapping(Constituent.class, "btb", unmappedConstituents, jcas);
- }
-
- @Test
- public void testChinese()
- throws Exception
- {
- JCas jcas = runTest("zh",
- "我们 需要 一个 非常 复杂 的 句子 例如 其中 包含 许多 成分 和 尽可能 的 依赖 。");
-
- String[] constituentMapped = { "ADVP 20,22", "ADVP 9,11", "NP 0,2", "NP 17,19", "NP 23,25",
- "NP 23,34", "NP 32,34", "NP 37,40", "NP 37,45", "NP 43,45", "NP 6,34", "NP 6,45",
- "NP 6,8", "PARN 20,34", "QP 29,31", "ROOT 0,47", "VP 12,14", "VP 26,28", "VP 3,45",
- "VP 9,14", "X 0,47", "X 23,28", "X 37,42", "X 6,14", "X 6,16" };
-
- String[] constituentOriginal = { "ADVP 20,22", "ADVP 9,11", "CP 6,16", "DNP 37,42",
- "IP 0,47", "IP 23,28", "IP 6,14", "NP 0,2", "NP 17,19", "NP 23,25", "NP 23,34",
- "NP 32,34", "NP 37,40", "NP 37,45", "NP 43,45", "NP 6,34", "NP 6,45", "NP 6,8",
- "PRN 20,34", "QP 29,31", "ROOT 0,47", "VP 12,14", "VP 26,28", "VP 3,45",
- "VP 9,14" };
-
- String[] posMapped = { "POS_PRON", "POS_VERB", "POS_NOUN", "POS_ADJ", "POS_VERB",
- "POS_PART", "POS_NOUN", "POS_ADJ", "POS_NOUN", "POS_VERB", "POS_NUM", "POS_NOUN",
- "POS_CONJ", "POS_NOUN", "POS_PART", "POS_NOUN", "POS_PUNCT" };
-
- String[] posOriginal = { "PN", "VV", "NN", "AD", "VA", "DEC", "NN", "AD", "NN", "VV", "CD",
- "NN", "CC", "NN", "DEG", "NN", "PU" };
-
- String pennTree = "(ROOT (IP (NP (PN 我们)) (VP (VV 需要) (NP (NP (CP (IP (NP (NN 一个)) "
- + "(VP (ADVP (AD 非常)) (VP (VA 复杂)))) (DEC 的)) (NP (NN 句子)) (PRN (ADVP "
- + "(AD 例如)) (NP (IP (NP (NN 其中)) (VP (VV 包含))) (QP (CD 许多)) (NP "
- + "(NN 成分))))) (CC 和) (NP (DNP (NP (NN 尽可能)) (DEG 的)) (NP (NN 依赖))))) "
- + "(PU 。)))";
-
- String[] posTags = { "AD", "AS", "BA", "CC", "CD", "CS", "DEC", "DEG", "DER", "DEV", "DT",
- "ETC", "FW", "IJ", "JJ", "LB", "LC", "M", "MSP", "NN", "NP", "NR", "NT", "OD", "P",
- "PN", "PU", "SB", "SP", "VA", "VC", "VE", "VP", "VV", "X" };
-
- String[] constituentTags = { "ADJP", "ADVP", "CLP", "CP", "DNP", "DP", "DVP", "FRAG",
- "INTJ", "IP", "LCP", "LST", "MSP", "NN", "NP", "PP", "PRN", "QP", "ROOT", "UCP",
- "VCD", "VCP", "VNV", "VP", "VPT", "VRD", "VSB" };
-
- String[] unmappedPos = { "NP", "VP" };
-
- assertPOS(posMapped, posOriginal, select(jcas, POS.class));
- List trees = new ArrayList(select(jcas, PennTree.class));
- assertPennTree(pennTree, trees.get(0));
- assertConstituents(constituentMapped, constituentOriginal, select(jcas, Constituent.class));
- assertTagset(POS.class, "ctb", posTags, jcas);
- assertTagsetMapping(POS.class, "ctb", unmappedPos, jcas);
- assertTagset(Constituent.class, "ctb", constituentTags, jcas);
- // FIXME assertTagsetMapping(Constituent.class, "ctb", new String[] {}, jcas);
- }
-
- @Test
- public void testEnglish()
- throws Exception
- {
- JCas jcas = runTest("en", documentEnglish);
-
- String[] constituentMapped = { "ADJP 10,26", "ADJP 102,110", "ADJP 61,68", "NP 0,2",
- "NP 61,98", "NP 8,110", "NP 8,43", "PP 99,110", "ROOT 0,112", "S 0,112",
- "S 52,110", "SBAR 46,110", "VP 3,110", "VP 52,110", "WHNP 46,51" };
-
- String[] constituentOriginal = { "ADJP 10,26", "ADJP 102,110", "ADJP 61,68", "NP 0,2",
- "NP 61,98", "NP 8,110", "NP 8,43", "PP 99,110", "ROOT 0,112", "S 0,112",
- "S 52,110", "SBAR 46,110", "VP 3,110", "VP 52,110", "WHNP 46,51" };
-
- String[] posMapped = { "POS_PRON", "POS_VERB", "POS_DET", "POS_ADV", "POS_ADJ", "POS_NOUN",
- "POS_NOUN", "POS_PUNCT", "POS_DET", "POS_VERB", "POS_ADP", "POS_ADJ", "POS_NOUN",
- "POS_CONJ", "POS_NOUN", "POS_ADP", "POS_ADJ", "POS_PUNCT" };
-
- String[] posOriginal = { "PRP", "VBP", "DT", "RB", "JJ", "NN", "NN", ",",
- "WDT", "VBZ", "IN", "JJ", "NNS", "CC", "NNS", "IN", "JJ", "." };
-
- String pennTree = "(ROOT (S (NP (PRP We)) (VP (VBP need) (NP (NP (DT a) (ADJP (RB very) " +
- "(JJ complicated)) (NN example) (NN sentence)) (, ,) (SBAR (WHNP (WDT which)) (S " +
- "(VP (VBZ contains) (NP (ADJP (IN as) (JJ many)) (NNS constituents) (CC and) " +
- "(NNS dependencies)) (PP (IN as) (ADJP (JJ possible)))))))) (. .)))";
-
- String[] posTags = { "#", "$", "''", ",", "-LRB-", "-RRB-", ".", ":", "CC",
- "CD", "DT", "EX", "FW", "IN", "JJ", "JJR", "JJS", "LS", "MD", "NN", "NNP", "NNPS",
- "NNS", "PDT", "POS", "PRP", "PRP$", "RB", "RBR", "RBS", "RP", "SYM", "TO", "UH",
- "VB", "VBD", "VBG", "VBN", "VBP", "VBZ", "WDT", "WP", "WP$", "WRB", "``" };
-
- String[] constituentTags = { "ADJP", "ADVP", "CONJP", "FRAG", "INTJ", "LST",
- "NAC", "NP", "NX", "PP", "PRN", "PRT", "PRT|ADVP", "QP", "ROOT", "RRC", "S", "SBAR",
- "SBARQ", "SINV", "SQ", "UCP", "VP", "WHADJP", "WHADVP", "WHNP", "WHPP", "X" };
-
- String[] unmappedPos = {};
-
- String[] unmappedConst = {};
-
- assertPOS(posMapped, posOriginal, select(jcas, POS.class));
- assertPennTree(pennTree, selectSingle(jcas, PennTree.class));
- assertConstituents(constituentMapped, constituentOriginal, select(jcas, Constituent.class));
- assertTagset(POS.class, "ptb", posTags, jcas);
- assertTagsetMapping(POS.class, "ptb", unmappedPos, jcas);
- assertTagset(Constituent.class, "ptb", constituentTags, jcas);
- // FIXME assertTagsetMapping(Constituent.class, "ptb", unmappedConst, jcas);
- }
-
- @Test
- public void testEnglishPreTagged()
- throws Exception
- {
- JCas jcas = runTest("en", null, documentEnglish, true);
-
- String[] constituentMapped = { "ADJP 10,26", "ADJP 102,110", "NP 0,2", "NP 64,110",
- "NP 64,98", "NP 8,110", "NP 8,43", "PP 61,110", "PP 99,110", "ROOT 0,112",
- "S 0,112", "S 52,110", "SBAR 46,110", "VP 3,110", "VP 52,110", "WHNP 46,51" };
-
- String[] constituentOriginal = { "ADJP 10,26", "ADJP 102,110", "NP 0,2", "NP 64,110",
- "NP 64,98", "NP 8,110", "NP 8,43", "PP 61,110", "PP 99,110", "ROOT 0,112",
- "S 0,112", "S 52,110", "SBAR 46,110", "VP 3,110", "VP 52,110", "WHNP 46,51" };
-
- String[] posMapped = { "POS_PRON", "POS_VERB", "POS_DET", "POS_ADV", "POS_ADJ", "POS_NOUN",
- "POS_NOUN", "POS_PUNCT", "POS_DET", "POS_VERB", "POS_ADP", "POS_ADJ", "POS_NOUN",
- "POS_CONJ", "POS_NOUN", "POS_ADP", "POS_ADJ", "POS_PUNCT" };
-
- String[] posOriginal = { "PRP", "VBP", "DT", "RB", "JJ", "NN", "NN", ",", "WDT", "VBZ",
- "IN", "JJ", "NNS", "CC", "NNS", "IN", "JJ", "." };
-
- String pennTree = "(ROOT (S (NP (PRP We)) (VP (VBP need) (NP (NP (DT a) (ADJP "
- + "(RB very) (JJ complicated)) (NN example) (NN sentence)) (, ,) (SBAR (WHNP "
- + "(WDT which)) (S (VP (VBZ contains) (PP (IN as) (NP (NP (JJ many) "
- + "(NNS constituents) (CC and) (NNS dependencies)) (PP (IN as) (ADJP "
- + "(JJ possible)))))))))) (. .)))";
-
- String[] posTags = { "#", "$", "''", ",", "-LRB-", "-RRB-", ".", ":", "CC", "CD", "DT",
- "EX", "FW", "IN", "JJ", "JJR", "JJS", "LS", "MD", "NN", "NNP", "NNPS", "NNS",
- "PDT", "POS", "PRP", "PRP$", "RB", "RBR", "RBS", "RP", "SYM", "TO", "UH", "VB",
- "VBD", "VBG", "VBN", "VBP", "VBZ", "WDT", "WP", "WP$", "WRB", "``" };
-
- String[] constituentTags = { "ADJP", "ADVP", "CONJP", "FRAG", "INTJ", "LST", "NAC", "NP",
- "NX", "PP", "PRN", "PRT", "PRT|ADVP", "QP", "ROOT", "RRC", "S", "SBAR", "SBARQ",
- "SINV", "SQ", "UCP", "VP", "WHADJP", "WHADVP", "WHNP", "WHPP", "X" };
-
- String[] unmappedPos = {};
-
- String[] unmappedConst = {};
-
- assertPOS(posMapped, posOriginal, select(jcas, POS.class));
- assertPennTree(pennTree, selectSingle(jcas, PennTree.class));
- assertConstituents(constituentMapped, constituentOriginal,
- select(jcas, Constituent.class));
- assertTagset(POS.class, "ptb", posTags, jcas);
- assertTagsetMapping(POS.class, "ptb", unmappedPos, jcas);
- assertTagset(Constituent.class, "ptb", constituentTags, jcas);
- // FIXME assertTagsetMapping(Constituent.class, "ptb", unmappedConst,
- // jcas);
- }
-
- @Test
- public void testGerman()
- throws Exception
- {
- JCas jcas = runTest("de", "Wir brauchen ein sehr kompliziertes Beispiel , welches " +
- "möglichst viele Konstituenten und Dependenzen beinhaltet .");
-
- String[] constituentMapped = { "ADJP 17,35", "Constituent 0,113", "NP 13,111", "NP 55,100",
- "NP 71,100", "ROOT 0,113", "S 0,111", "S 47,111" };
-
- String[] constituentOriginal = { "AP 17,35", "CNP 71,100", "NP 13,111", "NP 55,100",
- "PSEUDO 0,113", "ROOT 0,113", "S 0,111", "S 47,111" };
-
- String[] posOriginal = { "PPER", "VVFIN", "ART", "ADV", "ADJA", "NN", "$,", "PRELS", "ADV",
- "PIDAT", "NN", "KON", "NN", "VVFIN", "$." };
-
- String[] posMapped = { "POS_PRON", "POS_VERB", "POS_DET", "POS_ADV", "POS_ADJ", "POS_NOUN", "POS_PUNCT", "POS_PRON", "POS_ADV",
- "POS_PRON", "POS_NOUN", "POS_CONJ", "POS_NOUN", "POS_VERB", "POS_PUNCT" };
-
- String pennTree = "(ROOT (PSEUDO (S (PPER Wir) (VVFIN brauchen) (NP (ART ein) (AP " +
- "(ADV sehr) (ADJA kompliziertes)) (NN Beispiel) ($, ,) (S (PRELS welches) (NP " +
- "(ADV möglichst) (PIDAT viele) (CNP (NN Konstituenten) (KON und) " +
- "(NN Dependenzen))) (VVFIN beinhaltet)))) ($. .)))";
-
- String[] posTags = { "$*LRB*", "$,", "$.", "*T1*", "*T2*", "*T3*", "*T4*",
- "*T5*", "*T6*", "*T7*", "*T8*", "--", "ADJA", "ADJD", "ADV", "APPO", "APPR",
- "APPRART", "APZR", "ART", "CARD", "FM", "ITJ", "KOKOM", "KON", "KOUI", "KOUS",
- "NE", "NN", "PDAT", "PDS", "PIAT", "PIDAT", "PIS", "PPER", "PPOSAT", "PPOSS",
- "PRELAT", "PRELS", "PRF", "PROAV", "PTKA", "PTKANT", "PTKNEG", "PTKVZ", "PTKZU",
- "PWAT", "PWAV", "PWS", "TRUNC", "VAFIN", "VAIMP", "VAINF", "VAPP", "VMFIN",
- "VMINF", "VMPP", "VVFIN", "VVIMP", "VVINF", "VVIZU", "VVPP", "XY" };
-
- String[] constituentTags = { "---CJ", "AA", "AP", "AVP", "CAC", "CAP", "CAVP",
- "CCP", "CH", "CNP", "CO", "CPP", "CS", "CVP", "CVZ", "DL", "ISU", "MPN", "MTA",
- "NM", "NP", "PP", "PSEUDO", "QL", "ROOT", "S", "VP", "VZ" };
-
- String[] unmappedPos = { "$*LRB*", "*T1*", "*T2*", "*T3*", "*T4*", "*T5*",
- "*T6*", "*T7*", "*T8*", "--" };
-
- String[] unmappedConst = { "---CJ", "PSEUDO" };
-
- assertPOS(posMapped, posOriginal, select(jcas, POS.class));
- assertPennTree(pennTree, selectSingle(jcas, PennTree.class));
- assertConstituents(constituentMapped, constituentOriginal, select(jcas, Constituent.class));
- assertTagset(POS.class, "stts", posTags, jcas);
- assertTagsetMapping(POS.class, "stts", unmappedPos, jcas);
- assertTagset(Constituent.class, "negra", constituentTags, jcas);
- assertTagsetMapping(Constituent.class, "negra", unmappedConst, jcas);
- }
-
- @Test
- public void testFrench()
- throws Exception
- {
- JCas jcas = runTest("fr", "Nous avons besoin d' une phrase par exemple très " +
- "compliqué , qui contient des constituants que de nombreuses dépendances et que " +
- "possible .");
-
- String[] constituentMapped = { "ADJP 44,58", "NP 21,90", "NP 36,43", "NP 61,64",
- "NP 74,90", "NP 95,120", "PP 18,90", "PP 32,43", "ROOT 0,138", "S 0,138",
- "SBAR 124,136", "SBAR 61,90", "SBAR 91,120", "VP 0,17", "VP 65,73" };
-
- String[] constituentOriginal = { "AP 44,58", "NP 21,90", "NP 36,43", "NP 61,64",
- "NP 74,90", "NP 95,120", "PP 18,90", "PP 32,43", "ROOT 0,138", "SENT 0,138",
- "Srel 61,90", "Ssub 124,136", "Ssub 91,120", "VN 0,17", "VN 65,73" };
-
- String[] posMapped = { "POS_PRON", "POS_VERB", "POS_VERB", "POS_ADP", "POS_DET", "POS_NOUN", "POS_ADP", "POS_NOUN", "POS_ADV",
- "POS_ADJ", "POS_PUNCT", "POS_PRON", "POS_VERB", "POS_DET", "POS_NOUN", "POS_CONJ", "POS_DET", "POS_ADJ", "POS_NOUN", "POS_CONJ",
- "POS_CONJ", "POS_ADJ", "POS_PUNCT" };
-
- String[] posOriginal = { "CL", "V", "V", "P", "D", "N", "P", "N", "ADV", "A",
- ",", "PRO", "V", "D", "N", "C", "D", "A", "N", "C", "C", "A", "." };
-
- String pennTree = "(ROOT (ROOT (SENT (VN (CL Nous) (V avons) (V besoin)) (PP (P d') (NP "
- + "(D une) (N phrase) (PP (P par) (NP (N exemple))) (AP (ADV très) (A compliqué)) "
- + "(, ,) (Srel (NP (PRO qui)) (VN (V contient)) (NP (D des) (N constituants))))) "
- + "(Ssub (C que) (NP (D de) (A nombreuses) (N dépendances))) (C et) (Ssub (C que) "
- + "(A possible)) (. .))))";
-
- String[] posTags = { "\"", ",", "-LRB-", "-RRB-", ".", ":", "A", "ADV",
- "ADVP", "Afs", "C", "CC", "CL", "CS", "D", "Dmp", "ET", "I", "N", "ND", "P", "PC",
- "PREF", "PRO", "S", "V", "X", "_unknown_", "p", "près" };
-
- String[] constituentTags = { "AP", "AdP", "NP", "PP", "ROOT", "SENT", "Sint",
- "Srel", "Ssub", "VN", "VPinf", "VPpart" };
-
- String[] unmappedPos = { "\"", "-LRB-", "-RRB-", "ADVP", "Afs", "CC",
- "CS", "Dmp", "ND", "PC", "S", "X", "_unknown_", "p", "près" };
-
- assertPOS(posMapped, posOriginal, select(jcas, POS.class));
- assertPennTree(pennTree, selectSingle(jcas, PennTree.class));
- assertConstituents(constituentMapped, constituentOriginal, select(jcas, Constituent.class));
- assertTagset(POS.class, "ftb", posTags, jcas);
- assertTagsetMapping(POS.class, "ftb", unmappedPos, jcas);
- assertTagset(Constituent.class, "ftb", constituentTags, jcas);
- assertTagsetMapping(Constituent.class, "ftb", new String[] {}, jcas);
- }
-
- /**
- * Setup CAS to test parser for the English language (is only called once if
- * an English test is run)
- */
- private JCas runTest(String aLanguage, String aText)
- throws Exception
- {
- return runTest(aLanguage, null, aText, false);
- }
-
-
- private JCas runTest(String aLanguage, String aVariant, String aText, boolean aGoldPos,
- Object... aExtraParams)
- throws Exception
- {
- AggregateBuilder aggregate = new AggregateBuilder();
-
- if (aGoldPos) {
- aggregate.add(createEngineDescription(OpenNlpPosTagger.class));
- }
-
- Object[] params = new Object[] {
- BerkeleyParser.PARAM_VARIANT, aVariant,
- BerkeleyParser.PARAM_PRINT_TAGSET, true,
- BerkeleyParser.PARAM_WRITE_PENN_TREE, true,
- BerkeleyParser.PARAM_WRITE_POS, !aGoldPos,
- BerkeleyParser.PARAM_READ_POS, aGoldPos};
- params = ArrayUtils.addAll(params, aExtraParams);
- aggregate.add(createEngineDescription(BerkeleyParser.class, params));
-
- return TestRunner.runTest(aggregate.createAggregateDescription(), aLanguage, aText);
- }
-}
diff --git a/dkpro-core-berkeleyparser-gpl/src/test/resources/log4j2.xml b/dkpro-core-berkeleyparser-gpl/src/test/resources/log4j2.xml
deleted file mode 100644
index 19bf03b585..0000000000
--- a/dkpro-core-berkeleyparser-gpl/src/test/resources/log4j2.xml
+++ /dev/null
@@ -1,15 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
diff --git a/dkpro-core-bom-gpl/pom.xml b/dkpro-core-bom-gpl/pom.xml
index 2a6e9fa3ca..58b01c8ce3 100644
--- a/dkpro-core-bom-gpl/pom.xml
+++ b/dkpro-core-bom-gpl/pom.xml
@@ -49,11 +49,6 @@
pom
import
-
- org.dkpro.core
- dkpro-core-berkeleyparser-gpl
- 3.0.0-SNAPSHOT
-
org.dkpro.core
dkpro-core-corenlp-gpl
diff --git a/dkpro-core-gpl/pom.xml b/dkpro-core-gpl/pom.xml
index 5b16432719..979eecacbe 100644
--- a/dkpro-core-gpl/pom.xml
+++ b/dkpro-core-gpl/pom.xml
@@ -41,7 +41,6 @@
- ../dkpro-core-berkeleyparser-gpl
../dkpro-core-corenlp-gpl
../dkpro-core-lingpipe-gpl
../dkpro-core-matetools-gpl