Skip to content

Commit

Permalink
[MRG+2] addresses scikit-learn#8509 improvements to f_regression docu…
Browse files Browse the repository at this point in the history
…mentation (scikit-learn#8548)

* clarify role of the function and streamline introduction

* added feature selection methods to see also

* completed see also

* fixed pep related formatting for flake8checks.

* fixed extra whitespace flake8 problems, remaining failure is a copied see all line from another function, the line is over by a period, does not make sense to newline that.

* one more whitespace

* FIX small pep8 error.
  • Loading branch information
brownsarahm authored and jnothman committed Mar 7, 2017
1 parent 02c705e commit 5210f81
Showing 1 changed file with 13 additions and 4 deletions.
17 changes: 13 additions & 4 deletions sklearn/feature_selection/univariate_selection.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,17 +230,18 @@ def chi2(X, y):
def f_regression(X, y, center=True):
"""Univariate linear regression tests.
Quick linear model for testing the effect of a single regressor,
sequentially for many regressors.
Linear model for testing the individual effect of each of many regressors.
This is a scoring function to be used in a feature seletion procedure, not
a free standing feature selection procedure.
This is done in 2 steps:
1. The cross correlation between each regressor and the target is computed,
1. The correlation between each regressor and the target is computed,
that is, ((X[:, i] - mean(X[:, i])) * (y - mean_y)) / (std(X[:, i]) *
std(y)).
2. It is converted to an F score then to a p-value.
Read more in the :ref:`User Guide <univariate_feature_selection>`.
For more on usage see the :ref:`User Guide <univariate_feature_selection>`.
Parameters
----------
Expand All @@ -261,10 +262,18 @@ def f_regression(X, y, center=True):
pval : array, shape=(n_features,)
p-values of F-scores.
See also
--------
mutual_info_regression: Mutual information for a continuous target.
f_classif: ANOVA F-value between label/feature for classification tasks.
chi2: Chi-squared stats of non-negative features for classification tasks.
SelectKBest: Select features based on the k highest scores.
SelectFpr: Select features based on a false positive rate test.
SelectFdr: Select features based on an estimated false discovery rate.
SelectFwe: Select features based on family-wise error rate.
SelectPercentile: Select features based on percentile of the highest
scores.
"""
X, y = check_X_y(X, y, ['csr', 'csc', 'coo'], dtype=np.float64)
n_samples = X.shape[0]
Expand Down

0 comments on commit 5210f81

Please sign in to comment.