You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
... incorrectly merges list of int[], creates duplicated int values during the merge.
At the least, it impacts performance. Variance computing is completely wrong in the CARTRegressionTrainer.
To Reproduce
Create a unit test with this code:
public void bug() {
IntArrayContainer first = new IntArrayContainer(10);
IntArrayContainer second = new IntArrayContainer(10);
List<int[]> list = List.of(
new int[]{1, 2, 4},
new int[]{3, 5, 6, 7, 8, 9, 10}
);
IntArrayContainer.merge(list, first, second); //<-- buggy method
Assertions.assertEquals(1, first.array[0]);
Assertions.assertEquals(2, first.array[1]); //fails !!! array is filed with [1,1,2,2,3,4,4,5,6,7,8,9,10,0,0]
//expected is [1,2,3,4,5,6,7,8,9,10]
}
Expected behavior
//expected is [1,2,3,4,5,6,7,8,9,10]
System information:
Tribuo Version: [4.3.1]
OS: [Windows]
Java Version: [21]
JDK Vendor: [temurin-21.0.3]
Additional context
The mistake may lie in how the firstBuffer is initialized and updated in each subsequent merge:
public static int[] merge(List<int[]> input, IntArrayContainer firstBuffer, IntArrayContainer secondBuffer) {
if (input.size() > 0) {
firstBuffer.fill(input.get(0));
for (int i = 1; i < input.size(); i++) { // Start from the second array
merge(firstBuffer, input.get(i), secondBuffer);
IntArrayContainer tmp = secondBuffer;
secondBuffer = firstBuffer;
firstBuffer = tmp;
}
return firstBuffer.copy();
} else {
return new int[0];
}
}
Modification made:
Changed the loop start from i = 0 to i = 1. The first array is already used to initialize firstBuffer, so the merging should start from the second array (i = 1).
The text was updated successfully, but these errors were encountered:
Describe the bug
This method:
... incorrectly merges list of int[], creates duplicated int values during the merge.
At the least, it impacts performance. Variance computing is completely wrong in the CARTRegressionTrainer.
To Reproduce
Create a unit test with this code:
Expected behavior
System information:
Additional context
The mistake may lie in how the firstBuffer is initialized and updated in each subsequent merge:
Modification made:
Changed the loop start from i = 0 to i = 1. The first array is already used to initialize firstBuffer, so the merging should start from the second array (i = 1).
The text was updated successfully, but these errors were encountered: