You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[A clear and concise description of what the bug is.]
The transform primitive percent_change fails to work on top of an aggregation primitive, returning NaN or seemingly arbitrary results.
Code Sample, a copy-pastable example to reproduce your bug.
customer_id time COUNT(transactions) PERCENT_CHANGE(COUNT(transactions)) is_churned
abc 2022-06-15 6 NaN False
abc 2022-07-01 7 NaN False
abc 2022-08-01 8 NaN False
abc 2022-09-01 9 NaN False
abc 2022-10-01 10 NaN False
abc 2022-11-01 11 NaN False
abc 2022-12-01 12 NaN False
abc 2023-01-01 13 NaN False
abc 2023-02-01 14 NaN False
abc 2023-03-01 15 NaN False
abc 2023-04-01 16 NaN False
abc 2023-05-01 17 NaN False
abc 2023-06-01 18 NaN False
abc 2023-07-01 19 NaN False
Notice the feature PERCENT_CHANGE(COUNT(transactions)) has all NaN, which should be 0 for the first row and a value roughly 0.05 ~ 0.2.
I also noticed the result can be quite random in a large real dataset, which is hard to be reproduced here.
# Your code here
Output of featuretools.show_info()
[paste the output of featuretools.show_info() here below this line]
Featuretools version: 1.28.0
Featuretools installation directory: /Users/feizhan/Installs/miniconda3/envs/generic/lib/python3.9/site-packages/featuretools
SYSTEM INFO
python: 3.9.13.final.0
python-bits: 64
OS: Darwin
OS-release: 23.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8
[A clear and concise description of what the bug is.]
The transform primitive percent_change fails to work on top of an aggregation primitive, returning NaN or seemingly arbitrary results.
Code Sample, a copy-pastable example to reproduce your bug.
These are the transactions:
transaction_id | customer_id | quantity | transaction_date
101758183 | abc | 15 | 2021-12-15
101862984 | abc | 15 | 2022-01-15
101960142 | abc | 15 | 2022-02-15
102062271 | abc | 15 | 2022-03-15
102179828 | abc | 15 | 2022-04-15
102301689 | abc | 15 | 2022-05-15
102434267 | abc | 15 | 2022-06-15
102540706 | abc | 15 | 2022-07-15
102662863 | abc | 15 | 2022-08-15
102783888 | abc | 15 | 2022-09-15
102901638 | abc | 15 | 2022-10-15
103041277 | abc | 15 | 2022-11-15
103199236 | abc | 15 | 2022-12-15
103336795 | abc | 15 | 2023-01-15
103478291 | abc | 15 | 2023-02-15
103604244 | abc | 15 | 2023-03-15
103738142 | abc | 15 | 2023-04-15
103895757 | abc | 15 | 2023-05-15
104073119 | abc | 15 | 2023-06-15
104233610 | abc | 15 | 2023-07-15
Creating the lables:
def is_churned(df):
return len(df) == 0
label_maker = cp.LabelMaker(
target_dataframe_index='customer_id',
time_index='transaction_date',
labeling_function=is_churned,
window_size='60d'
)
labels = label_maker.search(
df=tt1,
num_examples_per_instance=-1,
gap='1MS',
drop_empty=False,
minimum_data=6,
verbose=True
)
Labels will be like:
customer_id time is_churned
abc 2022-06-15 False
abc 2022-07-01 False
abc 2022-08-01 False
abc 2022-09-01 False
abc 2022-10-01 False
abc 2022-11-01 False
abc 2022-12-01 False
abc 2023-01-01 False
abc 2023-02-01 False
abc 2023-03-01 False
abc 2023-04-01 False
abc 2023-05-01 False
abc 2023-06-01 False
abc 2023-07-01 False
Create the EntitySet:
es = ft.EntitySet('bug')
es.add_dataframe(
dataframe=tt1,
dataframe_name='transactions',
time_index='transaction_date',
index='transaction_id'
)
es.normalize_dataframe(
base_dataframe_name='transactions',
new_dataframe_name='persons',
index='customer_id',
make_time_index=True
)
es.add_last_time_indexes()
Creating the features:
fm, fd = ft.dfs(
entityset=es,
target_dataframe_name='persons',
agg_primitives=['count'],
trans_primitives=['percent_change'],
cutoff_time=labels,
max_depth=2,
cutoff_time_in_index=True,
include_cutoff_time=False,
verbose=True,
)
customer_id time COUNT(transactions) PERCENT_CHANGE(COUNT(transactions)) is_churned
abc 2022-06-15 6 NaN False
abc 2022-07-01 7 NaN False
abc 2022-08-01 8 NaN False
abc 2022-09-01 9 NaN False
abc 2022-10-01 10 NaN False
abc 2022-11-01 11 NaN False
abc 2022-12-01 12 NaN False
abc 2023-01-01 13 NaN False
abc 2023-02-01 14 NaN False
abc 2023-03-01 15 NaN False
abc 2023-04-01 16 NaN False
abc 2023-05-01 17 NaN False
abc 2023-06-01 18 NaN False
abc 2023-07-01 19 NaN False
Notice the feature PERCENT_CHANGE(COUNT(transactions)) has all NaN, which should be 0 for the first row and a value roughly 0.05 ~ 0.2.
I also noticed the result can be quite random in a large real dataset, which is hard to be reproduced here.
# Your code here
Output of
featuretools.show_info()
[paste the output of
featuretools.show_info()
here below this line]Featuretools version: 1.28.0
Featuretools installation directory: /Users/feizhan/Installs/miniconda3/envs/generic/lib/python3.9/site-packages/featuretools
SYSTEM INFO
python: 3.9.13.final.0
python-bits: 64
OS: Darwin
OS-release: 23.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8
INSTALLED VERSIONS
numpy: 1.23.4
pandas: 1.5.1
tqdm: 4.65.0
cloudpickle: 2.2.1
dask: 2023.3.2
distributed: 2023.3.2
psutil: 5.9.3
pip: 22.3
setuptools: 65.5.0
The text was updated successfully, but these errors were encountered: