Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNN.predict should return labels, not probabilties #20

Open
dnouri opened this issue Mar 15, 2015 · 5 comments
Open

RNN.predict should return labels, not probabilties #20

dnouri opened this issue Mar 15, 2015 · 5 comments

Comments

@dnouri
Copy link

dnouri commented Mar 15, 2015

There's a slight incompatibility with sklearn in the RNN.predict method: this one should return predicted class labels. predict_proba is the name of the method that returns probabilities. In Passage's case it's like the existing predict except that, for binary classification tasks, sklearn expects a (n,2) matrix with one column for each of negative and positive probabilities.

Here's the two methods (a hack) that I use in a subclass to implement predict and predict_proba to work with sklearn, on top of the existing RNN.predict. As it is, it only works with binary classification:

    def predict_proba(self, X):
        proba_pos = super(RNN, self).predict(X)
        proba_neg = 1 - proba_pos
        return np.hstack([proba_neg, proba_pos])

    def predict(self, X):
        return self.predict_proba(X).argmax(1)

As I'm not sure what else the current predict can return (i.e. when it's not doing binary classification), I'm also not sure what's the right way to change the original code, so that it still works with all the tasks that it was designed for.

@Slater-Victoroff
Copy link
Contributor

@Newmu Bumping this up on your radar to make sure you've seen it.

@dnouri Rounded up with Alec on this quickly, seems like it should be a pretty quick fix, and this is the kind of sklearn-type usability that I think would be excellent to incorporate.

@madisonmay
Copy link
Contributor

@dnouri -- What makes this a bit tricky is that Passage does not have distinct classes for regression and classification tasks, meaning that there's no simple interface fix to make this fit the sklearn interface properly (changing to predict_proba would make this interface inconsistent in the case of regression).

@Newmu -- Would you prefer that this be a documentation / example change or would you prefer a solution where we provide sklearn-compatible interfaces via subclasses of RNN (i.e. RNNRegressor and RNNClassifier)?

@wyqnumber
Copy link

result = model.predict(tokenizer.transform(dataTest))
I would like the reuslt like [1, 0, 0, ......].
Indeed, result is
[[ 0.04424067]
[ 0.03570492]
[ 0.05069015]
[ 0.06563961]
[ 0.05930467]
[ 0.04631713]
[ 0.14057502]
[ 0.01756088]
[ 0.01704285]
[ 0.04108404]
[ 0.02220613]
[ 0.02747946]
[ 0.07257298]
[ 0.02124731]
[ 0.02848194]
[ 0.0518975 ]
[ 0.12870379]
[ 0.03803665]
[ 0.03353238]
[ 0.02539251]
[ 0.02556736]
[ 0.1885419 ]
[ 0.02384598]
[ 0.07756392]
[ 0.18340758]
[ 0.01600601]
[ 0.01973476]
[ 0.02859995]
[ 0.14280389]
[ 0.19487601]
[ 0.0739686 ]
[ 0.04073641]
[ 0.02926875]
[ 0.02508903]
[ 0.07490836]
[ 0.14724793]]
what is it? How to tranform it to [1, 0, 0, ......]?

@ma2rten
Copy link

ma2rten commented Jun 25, 2015

result = model.predict(tokenizer.transform(dataTest)) > 0.5

@ddofer
Copy link

ddofer commented Jul 30, 2015

How to solve this for multiclass predictions?
(At least to the point of being able to measure performance).

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants