Can not specify the classes of a prediction outcome #654

HaloCollider · 2023-02-18T06:09:55Z

I'm tackling with a binomial classification task, where the dependent variable y is a numeric type instead of a factor type (namely 0 and 1), in the convenience of the following numeric calculation. My problem is that:

The prediction returned by the model is a n by 2 dataframe (or some datatype alike), with each column representing the probability of a class but has no column names. What's important is that the order of the columns does not necessarily match the "0 and 1" order, so I cannot simply use the second column's value as the probability of y = 1 in this binomial classification case. I haven't figure out the logic behind this, so it seems that the order is kind of randomly produced.

Therefore, I want to ask whether we have a way to specify the different classes (0 or 1) of a prediction outcome in a classification scenario. It would be greater if we don't have to convert y into a factor type because we will do lots of numeric calculations after predicting. Thanks.

HaloCollider · 2023-02-18T06:19:42Z

I think this can be a serious problem for classification. Luckily, we have a very unbalanced sample so we can easily see that the order changed for different models, because some of them produced the exactly opposite predictions if the order remained the same. Still took a long time for me to find out though......

mnwright · 2023-03-03T10:09:03Z

Could you please give a reproducible example of the problem?

stephematician · 2023-05-21T13:00:11Z

If the data are not a factor (assuming using R interface), then columns are ordered in the same order that the values appear in the data (by row).

Using the R interface, the columns should have the correct names, however this won't be obvious if using the C++ interface. I also don't believe this is documented.

krzyzinskim · 2023-07-25T15:07:17Z

I encountered the same problem. @HaloCollider, it's probably out of date by now but the order of the classes in the matrix of predicted probabilities can be found in your.model$forest$class.values (I think it's always in the right order).

And @mnwright, here a small reproducible example:

library(ranger)

## 0 is first 
set.seed(123)
p <- 4
n <- 1000
X <- data.frame(matrix(rnorm(n*p), nrow = n))
y <- as.numeric(rowSums(X) > 0)

y[1:5] # [1] 0 0 0 1 1

model <- ranger(x=X,
               y=y, 
               probability=TRUE)

prediction_probs <- predict(model, X)$predictions
prediction_probs[1:5, ]
#           [,1]        [,2]
# [1,] 0.9956444 0.004355556
# [2,] 0.9906111 0.009388889
# [3,] 0.8179349 0.182065079
# [4,] 0.0780381 0.921961905
# [5,] 0.3289381 0.671061905

model$forest$class.values # [1] 0 1

#### 

## 1 is first 
set.seed(42)
X <- data.frame(matrix(rnorm(n*p), nrow = n))
y <- as.numeric(rowSums(X) > 0)

y[1:5] # [1] 1 0 0 0 0

model <- ranger(x=X,
                y=y, 
                probability=TRUE)

prediction_probs <- predict(model, X)$predictions
prediction_probs[1:5, ]
#            [,1]       [,2]
# [1,] 0.96184603 0.03815397
# [2,] 0.04116032 0.95883968
# [3,] 0.12405714 0.87594286
# [4,] 0.03781984 0.96218016
# [5,] 0.18086905 0.81913095

model$forest$class.values # [1] 1 0

I've found here that the matrix is only given column names when forest$levels is not NULL (and it is for non-factor response, related resolved issue). Perhaps it's worth naming the columns based on forest$class.values, which is always non-empty?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not specify the classes of a prediction outcome #654

Can not specify the classes of a prediction outcome #654

HaloCollider commented Feb 18, 2023

HaloCollider commented Feb 18, 2023

mnwright commented Mar 3, 2023

stephematician commented May 21, 2023

krzyzinskim commented Jul 25, 2023

Can not specify the classes of a prediction outcome #654

Can not specify the classes of a prediction outcome #654

Comments

HaloCollider commented Feb 18, 2023

HaloCollider commented Feb 18, 2023

mnwright commented Mar 3, 2023

stephematician commented May 21, 2023

krzyzinskim commented Jul 25, 2023