-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added DropNullColumn transformer to remove columns that contain only nulls #1115
Changes from 1 commit
bee630f
ccc9a02
d3f9c90
9d42b95
f249982
a1caf39
b0e3235
90be825
55764a8
c8fdaaa
0cdc0bd
34c0095
430c8e3
80bd408
e2ca33f
4dbba09
7d6f8ce
4771d18
f0b521a
ea9893b
c73db7e
09cf9c7
4e4f255
754e2ef
acafac6
4b0aa1c
35f8909
b4e419f
75f1110
e499dc1
c296829
ee6b7b5
92210b7
4cad44a
836a636
4ec95d6
3c25b84
8638516
e70f513
6083567
a543044
92f5430
24b18ba
62ef9d6
53cb8bd
7af96ca
11908b3
2499a37
548b792
58feaed
5a6539c
36c46d4
98b6c10
399954a
32ca7a0
b311317
a04fb50
7b635ef
43a61d4
c48a63d
5704ebf
ab5af46
3f69bde
801d745
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,7 +19,6 @@ | |
from skrub._datetime_encoder import DatetimeEncoder | ||
from skrub._gap_encoder import GapEncoder | ||
from skrub._minhash_encoder import MinHashEncoder | ||
from skrub._on_each_column import RejectColumn | ||
from skrub._table_vectorizer import TableVectorizer | ||
|
||
MSG_PANDAS_DEPRECATED_WARNING = "Skip deprecation warning" | ||
|
@@ -531,7 +530,7 @@ def test_changing_types(X_train, X_test, expected_X_out): | |
# only extract the total seconds | ||
datetime=DatetimeEncoder(resolution=None), | ||
# True by default | ||
null_column_strategy=False, | ||
null_column_strategy="keep", | ||
) | ||
|
||
table_vec.fit(X_train) | ||
|
@@ -766,7 +765,7 @@ def test_drop_null_column(): | |
"""Check that all null columns are dropped, and no more.""" | ||
# Don't drop null columns | ||
X = _get_missing_values_dataframe() | ||
tv = TableVectorizer(null_column_strategy="ignore") | ||
tv = TableVectorizer(null_column_strategy="keep") | ||
transformed = tv.fit_transform(X) | ||
|
||
assert sbd.shape(transformed) == sbd.shape(X) | ||
|
@@ -778,11 +777,7 @@ def test_drop_null_column(): | |
|
||
# Raise exception if a null column is found | ||
with pytest.raises( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test is still failing because the TableVectorizer is not raising the correct exception and I don't know how to make it do that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here raise a ValueError instead of RejectColumn There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rejectcolumn is a way to signify to the tablevectorizer "I'm not the right transformer for this column, don't apply me here". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gotcha, fixed |
||
RejectColumn, match="Column all_null contains only null values." | ||
ValueError, | ||
): | ||
tv = TableVectorizer(null_column_strategy="raise") | ||
transformed = tv.fit_transform(X) | ||
|
||
# # Raise an exception if an unknown parameter is found | ||
# tv = TableVectorizer(null_column_strategy="wrong_parameter") | ||
# transformed = tv.fit_transform(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set this to false to keep the original behavior with no DropNullColumns. Given that the default value is True, should I change the test so that the "default behavior" is what is tested here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok the way you did it