RNN.predict should return labels, not probabilties #20

dnouri · 2015-03-15T21:08:38Z

There's a slight incompatibility with sklearn in the RNN.predict method: this one should return predicted class labels. predict_proba is the name of the method that returns probabilities. In Passage's case it's like the existing predict except that, for binary classification tasks, sklearn expects a (n,2) matrix with one column for each of negative and positive probabilities.

Here's the two methods (a hack) that I use in a subclass to implement predict and predict_proba to work with sklearn, on top of the existing RNN.predict. As it is, it only works with binary classification:

    def predict_proba(self, X):
        proba_pos = super(RNN, self).predict(X)
        proba_neg = 1 - proba_pos
        return np.hstack([proba_neg, proba_pos])

    def predict(self, X):
        return self.predict_proba(X).argmax(1)

As I'm not sure what else the current predict can return (i.e. when it's not doing binary classification), I'm also not sure what's the right way to change the original code, so that it still works with all the tasks that it was designed for.

The text was updated successfully, but these errors were encountered:

Slater-Victoroff · 2015-04-07T16:34:36Z

@Newmu Bumping this up on your radar to make sure you've seen it.

@dnouri Rounded up with Alec on this quickly, seems like it should be a pretty quick fix, and this is the kind of sklearn-type usability that I think would be excellent to incorporate.

madisonmay · 2015-04-13T19:56:53Z

@dnouri -- What makes this a bit tricky is that Passage does not have distinct classes for regression and classification tasks, meaning that there's no simple interface fix to make this fit the sklearn interface properly (changing to predict_proba would make this interface inconsistent in the case of regression).

@Newmu -- Would you prefer that this be a documentation / example change or would you prefer a solution where we provide sklearn-compatible interfaces via subclasses of RNN (i.e. RNNRegressor and RNNClassifier)?

wyqnumber · 2015-04-19T08:20:38Z

result = model.predict(tokenizer.transform(dataTest))
I would like the reuslt like [1, 0, 0, ......].
Indeed, result is
[[ 0.04424067]
[ 0.03570492]
[ 0.05069015]
[ 0.06563961]
[ 0.05930467]
[ 0.04631713]
[ 0.14057502]
[ 0.01756088]
[ 0.01704285]
[ 0.04108404]
[ 0.02220613]
[ 0.02747946]
[ 0.07257298]
[ 0.02124731]
[ 0.02848194]
[ 0.0518975 ]
[ 0.12870379]
[ 0.03803665]
[ 0.03353238]
[ 0.02539251]
[ 0.02556736]
[ 0.1885419 ]
[ 0.02384598]
[ 0.07756392]
[ 0.18340758]
[ 0.01600601]
[ 0.01973476]
[ 0.02859995]
[ 0.14280389]
[ 0.19487601]
[ 0.0739686 ]
[ 0.04073641]
[ 0.02926875]
[ 0.02508903]
[ 0.07490836]
[ 0.14724793]]
what is it? How to tranform it to [1, 0, 0, ......]?

ma2rten · 2015-06-25T15:32:59Z

result = model.predict(tokenizer.transform(dataTest)) > 0.5

ddofer · 2015-07-30T13:48:26Z

How to solve this for multiclass predictions?
(At least to the point of being able to measure performance).

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNN.predict should return labels, not probabilties #20

RNN.predict should return labels, not probabilties #20

dnouri commented Mar 15, 2015

Slater-Victoroff commented Apr 7, 2015

madisonmay commented Apr 13, 2015

wyqnumber commented Apr 19, 2015

ma2rten commented Jun 25, 2015

ddofer commented Jul 30, 2015

RNN.predict should return labels, not probabilties #20

RNN.predict should return labels, not probabilties #20

Comments

dnouri commented Mar 15, 2015

Slater-Victoroff commented Apr 7, 2015

madisonmay commented Apr 13, 2015

wyqnumber commented Apr 19, 2015

ma2rten commented Jun 25, 2015

ddofer commented Jul 30, 2015