-
-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not specify the classes of a prediction outcome #654
Comments
I think this can be a serious problem for classification. Luckily, we have a very unbalanced sample so we can easily see that the order changed for different models, because some of them produced the exactly opposite predictions if the order remained the same. Still took a long time for me to find out though...... |
Could you please give a reproducible example of the problem? |
If the data are not a factor (assuming using R interface), then columns are ordered in the same order that the values appear in the data (by row). Using the R interface, the columns should have the correct names, however this won't be obvious if using the C++ interface. I also don't believe this is documented. |
I encountered the same problem. @HaloCollider, it's probably out of date by now but the order of the classes in the matrix of predicted probabilities can be found in And @mnwright, here a small reproducible example: library(ranger)
## 0 is first
set.seed(123)
p <- 4
n <- 1000
X <- data.frame(matrix(rnorm(n*p), nrow = n))
y <- as.numeric(rowSums(X) > 0)
y[1:5] # [1] 0 0 0 1 1
model <- ranger(x=X,
y=y,
probability=TRUE)
prediction_probs <- predict(model, X)$predictions
prediction_probs[1:5, ]
# [,1] [,2]
# [1,] 0.9956444 0.004355556
# [2,] 0.9906111 0.009388889
# [3,] 0.8179349 0.182065079
# [4,] 0.0780381 0.921961905
# [5,] 0.3289381 0.671061905
model$forest$class.values # [1] 0 1
####
## 1 is first
set.seed(42)
X <- data.frame(matrix(rnorm(n*p), nrow = n))
y <- as.numeric(rowSums(X) > 0)
y[1:5] # [1] 1 0 0 0 0
model <- ranger(x=X,
y=y,
probability=TRUE)
prediction_probs <- predict(model, X)$predictions
prediction_probs[1:5, ]
# [,1] [,2]
# [1,] 0.96184603 0.03815397
# [2,] 0.04116032 0.95883968
# [3,] 0.12405714 0.87594286
# [4,] 0.03781984 0.96218016
# [5,] 0.18086905 0.81913095
model$forest$class.values # [1] 1 0 I've found here that the matrix is only given column names when |
I'm tackling with a binomial classification task, where the dependent variable y is a numeric type instead of a factor type (namely 0 and 1), in the convenience of the following numeric calculation. My problem is that:
The prediction returned by the model is a n by 2 dataframe (or some datatype alike), with each column representing the probability of a class but has no column names. What's important is that the order of the columns does not necessarily match the "0 and 1" order, so I cannot simply use the second column's value as the probability of y = 1 in this binomial classification case. I haven't figure out the logic behind this, so it seems that the order is kind of randomly produced.
Therefore, I want to ask whether we have a way to specify the different classes (0 or 1) of a prediction outcome in a classification scenario. It would be greater if we don't have to convert y into a factor type because we will do lots of numeric calculations after predicting. Thanks.
The text was updated successfully, but these errors were encountered: