Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mousetom sk/v3 #109

Open
wants to merge 74 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
07e1349
Fixing and extending OCEL, CELOE, Rho, Multiheuristic, and CWR
mousetom-sk Feb 27, 2023
3ec54fd
parcel integration, useNegation support
mousetom-sk Feb 28, 2023
fc7f4a8
parcel operator integration, useNegation support enhanced, test evalu…
mousetom-sk Mar 1, 2023
b99b4c4
reasoner cloning
mousetom-sk Mar 1, 2023
3cc9e7e
parcel heuristic accuracy reward back to default, reasoner cloning co…
mousetom-sk Mar 2, 2023
4afb94f
rho upward refinement takes into account (almost all) config options
mousetom-sk Mar 2, 2023
b24a487
rho useDisjunction when refining upwards and accuracy speedup
mousetom-sk Mar 3, 2023
6c9e996
accuracy speedup reverted
mousetom-sk Mar 3, 2023
bd4fd60
rho useDisjunction when refining upwards and accuracy speedup
mousetom-sk Mar 3, 2023
a23688d
minor refactoring, search tree initialization
mousetom-sk Mar 3, 2023
24af23d
rho top refinements correction, CWR min cardinality hasType corrected…
mousetom-sk Mar 3, 2023
f964671
rho top refinements correction, CWR min cardinality hasType corrected…
mousetom-sk Mar 3, 2023
c952cdd
celoe evaluated descriptions not computed from scratch
mousetom-sk Mar 3, 2023
b9ddecd
rho memento, upward refining enhanced, isSomeOnlySatisfied improved
mousetom-sk Mar 3, 2023
23832cf
rho memento, upward refining enhanced, isSomeOnlySatisfied improved
mousetom-sk Mar 3, 2023
91fb6b8
isSomeOnlySatisfied corrected, maxCardinalityLimit option added
mousetom-sk Mar 3, 2023
9b68c2c
isSomeOnlySatisfied corrected, maxCardinalityLimit option added
mousetom-sk Mar 3, 2023
9a80253
concurrent cwr added
mousetom-sk Mar 3, 2023
c2affc8
reversing the order of elements in search tree
mousetom-sk Mar 4, 2023
a04d5fd
partial-definition-found message reformatted
mousetom-sk Mar 4, 2023
fc81d15
partial-definition-found message reformatted
mousetom-sk Mar 4, 2023
9d0e428
partial-definition-found message reformatted
mousetom-sk Mar 4, 2023
4614622
maxCardinalityLimit in upward refinements, splits for numeric datatyp…
mousetom-sk Mar 4, 2023
4cde5fe
maxCardinalityLimit in upward refinements, splits for numeric datatyp…
mousetom-sk Mar 4, 2023
993abac
maxCardinalityLimit in upward refinements, splits for numeric datatyp…
mousetom-sk Mar 4, 2023
362571d
maxCardinalityLimit in upward refinements, splits for numeric datatyp…
mousetom-sk Mar 4, 2023
960384a
negations must not be disjoint with the current domain
mousetom-sk Mar 4, 2023
1b51716
shallow copying set of knowledge sources
mousetom-sk Mar 5, 2023
1646ade
useAllConstructor, useExistsConstructor, applyAllFilter, applyExistsF…
mousetom-sk Mar 5, 2023
e0dada1
useAllConstructor, useExistsConstructor, applyAllFilter, applyExistsF…
mousetom-sk Mar 5, 2023
8206d9a
useAllConstructor, useExistsConstructor, applyAllFilter, applyExistsF…
mousetom-sk Mar 5, 2023
b76ad47
useAllConstructor, useExistsConstructor, applyAllFilter, applyExistsF…
mousetom-sk Mar 5, 2023
bbd9144
useAllConstructor, useExistsConstructor, applyAllFilter, applyExistsF…
mousetom-sk Mar 5, 2023
23e517f
useDisjunction checked on conjunction creation
mousetom-sk Mar 5, 2023
3e2f118
useDisjunction checked on conjunction creation
mousetom-sk Mar 5, 2023
b5cb17c
useDisjunction checked on conjunction creation
mousetom-sk Mar 5, 2023
4bfffdf
useDisjunction checked on conjunction creation
mousetom-sk Mar 5, 2023
e642ba2
useDisjunction checked on conjunction creation
mousetom-sk Mar 5, 2023
dacb774
Merge branch 'concurrent-cwr' into develop
mousetom-sk Mar 5, 2023
3e0a784
removing commented-out code in workers
mousetom-sk Mar 5, 2023
15f33ef
Merge branch 'concurrent-cwr' into accuracy-speedup
mousetom-sk Mar 5, 2023
c4edb8c
Merge branch 'rho-speedup' into develop
mousetom-sk Mar 5, 2023
999cf7b
Merge branch 'parallel-alg-speedup' into develop
mousetom-sk Mar 5, 2023
a2494a1
Merge branch 'rho-speedup' into accuracy-speedup
mousetom-sk Mar 5, 2023
8d85c21
Merge branch 'parallel-alg-speedup' into accuracy-speedup
mousetom-sk Mar 5, 2023
631581c
Merge branch 'accuracy-speedup' into v2
mousetom-sk Mar 6, 2023
e80aa0d
s/parcel always asking for refinements of horizExp length
mousetom-sk Mar 6, 2023
ea56edc
accuracy/coverage calculation improved in s/parcel
mousetom-sk Mar 6, 2023
76ada64
s/parcel - first return nodes, then partial definitions
mousetom-sk Mar 6, 2023
c723fcc
ocel/celoe coverage info added
mousetom-sk Mar 6, 2023
f64f1db
removing todos
mousetom-sk Mar 6, 2023
f63804e
printing total execution time
mousetom-sk Mar 7, 2023
6408838
isSomeOnlySatisfied corrected, improved, and used instead of isCombin…
mousetom-sk Mar 7, 2023
1ee2f44
printing timestamps
mousetom-sk Mar 7, 2023
d4de70f
isSomeOnlySatisfied refactoring
mousetom-sk Mar 8, 2023
862298c
useRestrictedDisjunction option added
mousetom-sk Mar 8, 2023
6dcf0c7
custom numericValuesSplitter supported
mousetom-sk Mar 8, 2023
ad0bef2
isSomeOnlySatisfied considers min restrictions
mousetom-sk Mar 9, 2023
654809d
CELOE accuracy speedup reverted
mousetom-sk Mar 9, 2023
0a9e377
OSBean for measuring CPU time declared as a class property
mousetom-sk Mar 9, 2023
e85e0bf
accuracy synchronization, parcel discarding unpromising concepts, and…
mousetom-sk Mar 11, 2023
ce8f500
parcelex accuracy calculation without additional checks
mousetom-sk Mar 11, 2023
a20ce89
test accuracy computation moved to PosNegLPStandard
mousetom-sk Mar 12, 2023
dd28ff0
cwr corrections
mousetom-sk Mar 12, 2023
30840e0
rho maxCardinalityLimit copied
mousetom-sk Mar 12, 2023
99dedd2
compact coverage representation even leaner
mousetom-sk Mar 12, 2023
ebcdd96
isSomeOnlySatisfied checked after refining
mousetom-sk Mar 13, 2023
59d0260
allFilter improved
mousetom-sk Mar 13, 2023
d605676
compact coverage leaner again
mousetom-sk Mar 13, 2023
3d7678c
parcel accepting only the partial definitions which contibute to the …
mousetom-sk Mar 13, 2023
74d4681
OENode getting size of covered positives/negatives improved, ParcelEx…
mousetom-sk Mar 13, 2023
dbb15e7
ParcelExV2 partial definition resulting from combination has correct …
mousetom-sk Mar 13, 2023
ea57b03
minor fix in corrected data
mousetom-sk Mar 13, 2023
d08592e
removing code commented out
mousetom-sk Mar 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions components-core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -519,6 +519,11 @@
<artifactId>commons-math3</artifactId>
</dependency>

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-pool2</artifactId>
</dependency>

<dependency>
<groupId>org.jgrapht</groupId>
<artifactId>jgrapht-core</artifactId>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
/**
* Copyright (C) 2007 - 2016, Jens Lehmann
*
* This file is part of DL-Learner.
*
* DL-Learner is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 3 of the License, or
* (at your option) any later version.
*
* DL-Learner is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
package org.dllearner.algorithms.celoe;

import org.dllearner.utilities.owl.OWLClassExpressionLengthMetric;
import org.dllearner.utilities.owl.OWLClassExpressionUtils;

import java.util.Comparator;

public class AccuracyBasedComparator implements Comparator<OENode> {

private final OWLClassExpressionLengthMetric lengthMetric;

public AccuracyBasedComparator(OWLClassExpressionLengthMetric lengthMetric) {
this.lengthMetric = lengthMetric;
}

@Override
public int compare(OENode node1, OENode node2) {
int result = compareByAccuracy(node1, node2);

if (result != 0) {
return result;
}

return compareByLength(node1, node2);
}

private int compareByAccuracy(OENode node1, OENode node2) {
double node1Accuracy = node1.getAccuracy();
double node2Accuracy = node2.getAccuracy();

return Double.compare(node1Accuracy, node2Accuracy);
}

private int compareByLength(OENode node1, OENode node2) {
int node1Length = OWLClassExpressionUtils.getLength(node1.getDescription(), lengthMetric);
int mode2Length = OWLClassExpressionUtils.getLength(node2.getDescription(), lengthMetric);

return Integer.compare(mode2Length, node1Length);
}

@Override
public boolean equals(Object o) {
return (o instanceof AccuracyBasedComparator);
}
}
157 changes: 138 additions & 19 deletions components-core/src/main/java/org/dllearner/algorithms/celoe/CELOE.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,7 @@
import org.dllearner.core.owl.DatatypePropertyHierarchy;
import org.dllearner.core.owl.ObjectPropertyHierarchy;
import org.dllearner.kb.OWLAPIOntology;
import org.dllearner.learningproblems.ClassAsInstanceLearningProblem;
import org.dllearner.learningproblems.ClassLearningProblem;
import org.dllearner.learningproblems.PosNegLP;
import org.dllearner.learningproblems.PosOnlyLP;
import org.dllearner.learningproblems.*;
import org.dllearner.reasoning.ClosedWorldReasoner;
import org.dllearner.reasoning.OWLAPIReasoner;
import org.dllearner.reasoning.ReasonerImplementation;
Expand All @@ -50,6 +47,8 @@
import uk.ac.manchester.cs.owl.owlapi.OWLDataFactoryImpl;

import java.io.File;
import java.text.DecimalFormat;
import java.text.SimpleDateFormat;
import java.util.*;
import java.util.concurrent.TimeUnit;

Expand Down Expand Up @@ -171,6 +170,22 @@ public class CELOE extends AbstractCELA implements Cloneable{
private boolean stopOnFirstDefinition = false;

private int expressionTestCountLastImprovement;

OWLClassExpressionLengthMetric lengthMetric = OWLClassExpressionLengthMetric.getDefaultMetric();

private TreeMap<OENode, Double> solutionCandidates;
private final double solutionCandidatesMinAccuracyDiff = 0.0001;

@ConfigOption(defaultValue = "0.0", description = "determines a lower bound on noisiness of an expression with respect to noisePercentage " +
"in order to be considered a reasonable solution candidate (must be non-negative), e.g. for noisePercentage = 15 and noisePercentageMargin = 5, " +
"the algorithm will suggest expressions with the number of misclassified positives less than or equal to 20% of all examples " +
"as solution candidates as well; note: difference between accuracies of any two candidates must be at least 0.01% to ensure diversity")
private double noisePercentageMargin = 0.0;

@ConfigOption(defaultValue = "20", description = "the number of solution candidates within margin to be presented, sorted in descending order by accuracy")
private int maxNrOfResultsWithinMargin = 20;

private double noiseWithMargin;


@SuppressWarnings("unused")
Expand Down Expand Up @@ -228,6 +243,9 @@ public CELOE(CELOE celoe){

setWriteSearchTree(celoe.writeSearchTree);
setReplaceSearchTree(celoe.replaceSearchTree);

setMaxNrOfResultsWithinMargin(celoe.maxNrOfResultsWithinMargin);
setNoisePercentageMargin(celoe.noisePercentageMargin);
}

public CELOE(AbstractClassExpressionLearningProblem problem, AbstractReasonerComponent reasoner) {
Expand Down Expand Up @@ -313,6 +331,17 @@ public void init() throws ComponentInitException {

if (!((AbstractRefinementOperator) operator).isInitialized())
operator.init();

operator.setLengthMetric(lengthMetric);

AccuracyBasedComparator solutionComparator = new AccuracyBasedComparator(lengthMetric);
solutionCandidates = new TreeMap<>(solutionComparator);

if (noisePercentageMargin < 0) {
noisePercentageMargin = 0.0;
}

noiseWithMargin = (noisePercentage + noisePercentageMargin) / 100.0;

initialized = true;
}
Expand All @@ -327,6 +356,8 @@ public void start() {
currentHighestAccuracy = 0.0;
OENode nextNode;

String timeStamp = new SimpleDateFormat("HH.mm.ss").format(new Date());
logger.info("Time " + getCurrentCpuMillis() / 1000.0 + "s; " + timeStamp);
logger.info("start class:" + startClass);
addNode(startClass, null);

Expand Down Expand Up @@ -372,9 +403,17 @@ public void start() {

// print some stats
printAlgorithmRunStats();

printSolutionCandidates();

// print solution(s)
logger.info("solutions:\n" + getSolutionString());

if (learningProblem instanceof PosNegLP) {
((PosNegLP) learningProblem).printTestEvaluation(bestEvaluatedDescriptions.getBest().getDescription());
}

printBestConceptsTimesAndAccuracies();

isRunning = false;
}
Expand Down Expand Up @@ -506,7 +545,7 @@ private TreeSet<OWLClassExpression> refineNode(OENode node) {
MonitorFactory.getTimeMonitor("refineNode").stop();
return refinements;
}

/**
* Add node to search tree if it is not too weak.
* @return TRUE if node was added and FALSE otherwise
Expand All @@ -528,27 +567,27 @@ private boolean addNode(OWLClassExpression description, OENode parentNode) {
logger.trace(sparql_debug, sparql_debug_out + "NOT ALLOWED");
return false;
}

// quality of class expression (return if too weak)
Monitor mon = MonitorFactory.start("lp");
logger.trace(sparql_debug, sparql_debug_out);
double accuracy = learningProblem.getAccuracyOrTooWeak(description, noise);
logger.trace(sparql_debug, "`acc:"+accuracy);
mon.stop();

// issue a warning if accuracy is not between 0 and 1 or -1 (too weak)
if(accuracy > 1.0 || (accuracy < 0.0 && accuracy != -1)) {
throw new RuntimeException("Invalid accuracy value " + accuracy + " for class expression " + description +
". This could be caused by a bug in the heuristic measure and should be reported to the DL-Learner bug tracker.");
}

expressionTests++;

// return FALSE if 'too weak'
if(accuracy == -1) {
return false;
}

OENode node = new OENode(description, accuracy);
searchTree.addNode(parentNode, node);

Expand Down Expand Up @@ -616,7 +655,23 @@ private boolean addNode(OWLClassExpression description, OENode parentNode) {

// System.out.println(bestEvaluatedDescriptions.getSet().size());
}


if (accuracy >= 1 - noiseWithMargin) {
if (solutionCandidates.isEmpty()
|| (accuracy > solutionCandidates.firstKey().getAccuracy()
&& solutionCandidates.keySet().stream().allMatch(
n -> Math.abs(accuracy - n.getAccuracy()) > solutionCandidatesMinAccuracyDiff
)
)
) {
solutionCandidates.put(node, getCurrentCpuMillis() / 1000.0);
}

if (solutionCandidates.size() > maxNrOfResultsWithinMargin) {
solutionCandidates.pollFirstEntry();
}
}

return true;
}

Expand Down Expand Up @@ -747,8 +802,8 @@ private boolean terminationCriteriaSatisfied() {
stop ||
(maxClassExpressionTestsAfterImprovement != 0 && (expressionTests - expressionTestCountLastImprovement >= maxClassExpressionTestsAfterImprovement)) ||
(maxClassExpressionTests != 0 && (expressionTests >= maxClassExpressionTests)) ||
(maxExecutionTimeInSecondsAfterImprovement != 0 && ((System.nanoTime() - nanoStartTime) >= (maxExecutionTimeInSecondsAfterImprovement* 1000000000L))) ||
(maxExecutionTimeInSeconds != 0 && ((System.nanoTime() - nanoStartTime) >= (maxExecutionTimeInSeconds* 1000000000L))) ||
(maxExecutionTimeInSecondsAfterImprovement != 0 && ((getCurrentCpuMillis() - timeLastImprovement) >= (maxExecutionTimeInSecondsAfterImprovement * 1000L))) ||
(maxExecutionTimeInSeconds != 0 && (getCurrentCpuMillis() >= (maxExecutionTimeInSeconds * 1000L))) ||
(terminateOnNoiseReached && (100*getCurrentlyBestAccuracy()>=100-noisePercentage)) ||
(stopOnFirstDefinition && (getCurrentlyBestAccuracy() >= 1));
}
Expand All @@ -761,34 +816,82 @@ private void reset() {
bestEvaluatedDescriptions.getSet().clear();
expressionTests = 0;
runtimeVsBestScore.clear();

solutionCandidates.clear();
}

private void printAlgorithmRunStats() {
String timeStamp = new SimpleDateFormat("HH.mm.ss").format(new Date());
logger.info("Time " + getCurrentCpuMillis() / 1000.0 + "s; " + timeStamp);

if (stop) {
logger.info("Algorithm stopped ("+expressionTests+" descriptions tested). " + searchTree.size() + " nodes in the search tree.\n");
logger.info(reasoner.toString());
} else {
totalRuntimeNs = System.nanoTime()-nanoStartTime;
logger.info("Algorithm terminated successfully (time: " + Helper.prettyPrintNanoSeconds(totalRuntimeNs) + ", "+expressionTests+" descriptions tested, " + searchTree.size() + " nodes in the search tree).\n");
logger.info("Algorithm terminated successfully ("+expressionTests+" descriptions tested, " + searchTree.size() + " nodes in the search tree).\n");
logger.info(reasoner.toString());
}
}


private void printSolutionCandidates() {
DecimalFormat df = new DecimalFormat();

if (solutionCandidates.size() > 0) {
// we do not need to print the best node if we display the top 20 solutions below anyway
logger.info("solutions within margin (at most " + maxNrOfResultsWithinMargin + " are shown):");
int show = 1;
for (OENode c : solutionCandidates.descendingKeySet()) {
int tpTest = learningProblem instanceof PosNegLP
? ((PosNegLP) learningProblem).getTestCoverage(c.getDescription())
: 0;

logger.info(show + ": " + renderer.render(c.getDescription())
+ " (accuracy " + df.format(100 * c.getAccuracy()) + "% / "
+ df.format(100 * computeTestAccuracy(c.getDescription())) + "%"
+ ", coverage " + c.getNumberOfCoveredPositiveExamples() + " / " + tpTest
+ ", length " + OWLClassExpressionUtils.getLength(c.getDescription())
+ ", depth " + OWLClassExpressionUtils.getDepth(c.getDescription())
+ ", time " + df.format(solutionCandidates.get(c)) + "s)");
if (show >= maxNrOfResultsWithinMargin) {
break;
}
show++;
}
} else {
logger.info("no appropriate solutions within margin found (try increasing the noisePercentageMargin)");
}
}

private void showIfBetterSolutionsFound() {
if(!singleSuggestionMode && bestEvaluatedDescriptions.getBestAccuracy() > currentHighestAccuracy) {
currentHighestAccuracy = bestEvaluatedDescriptions.getBestAccuracy();
expressionTestCountLastImprovement = expressionTests;
timeLastImprovement = System.nanoTime();
timeLastImprovement = getCurrentCpuMillis();
long durationInMillis = getCurrentRuntimeInMilliSeconds();
String durationStr = getDurationAsString(durationInMillis);

double cpuTime = getCurrentCpuMillis() / 1000.0;

OWLClassExpression bestDescription = bestEvaluatedDescriptions.getBest().getDescription();
double testAccuracy = computeTestAccuracy(bestDescription);

// track new best accuracy if enabled
if(keepTrackOfBestScore) {
runtimeVsBestScore.put(getCurrentRuntimeInMilliSeconds(), currentHighestAccuracy);
}
logger.info("more accurate (" + dfPercent.format(currentHighestAccuracy) + ") class expression found after " + durationStr + ": " + descriptionToString(bestEvaluatedDescriptions.getBest().getDescription()));

logger.info(
"Time " + cpuTime +
"s: more accurate (training: " + dfPercent.format(currentHighestAccuracy) +
", test: " + dfPercent.format(testAccuracy) +
") class expression found after " + durationStr + ": " +
descriptionToString(bestEvaluatedDescriptions.getBest().getDescription())
);

recordBestConceptTimeAndAccuracy(cpuTime, currentHighestAccuracy, testAccuracy);
}
}

private void writeSearchTree(TreeSet<OWLClassExpression> refinements) {
StringBuilder treeString = new StringBuilder("best node: ").append(bestEvaluatedDescriptions.getBest()).append("\n");
if (refinements.size() > 1) {
Expand Down Expand Up @@ -1100,6 +1203,22 @@ public SortedMap<Long, Double> getRuntimeVsBestScore(long ticksIntervalTimeValue
return map;
}

public int getMaxNrOfResultsWithinMargin() {
return maxNrOfResultsWithinMargin;
}

public void setMaxNrOfResultsWithinMargin(int maxNrOfResultsWithinMargin) {
this.maxNrOfResultsWithinMargin = maxNrOfResultsWithinMargin;
}

public double getNoisePercentageMargin() {
return noisePercentageMargin;
}

public void setNoisePercentageMargin(double noisePercentageMargin) {
this.noisePercentageMargin = noisePercentageMargin;
}

/* (non-Javadoc)
* @see java.lang.Object#clone()
*/
Expand Down
Loading