Generation failing with infinite time for imbalanced datasets #22

omaralvarez · 2024-11-02T07:04:08Z

First of all, thanks for your contribution. I am using your model for dataset rebalancing, and with datasets with low and imbalanced samples I am facing a problem. Generation fails due to reaching the maximum amount of tries, I have tried several approaches like increasing the epochs or trying to use #7, to no avail, generation never completes. One of the datasets in which is happening is:

https://imbalanced-learn.org/stable/datasets/index.html

from imblearn.datasets import fetch_datasets

ecoli = fetch_datasets()['ecoli']
ecoli.data.shape

omaralvarez · 2024-11-07T09:50:11Z

I think I found the problem, it has to do with negative numeric target labels. Using strings it works. The same thing is happening in your other model CTABGAN.

zhao-zilong · 2024-11-08T14:35:14Z

Hi @omaralvarez thanks for your comment. What do you mean by negative numeric target labels? like "-1" and "-5.78" in the target label column?

omaralvarez · 2024-11-08T15:01:31Z

I finally found the issue, sometimes the original pandas datatypes are not given back when the model generates samples. It returns objects (I think strings). So that was causing a bug in my code. A simple:

        sample = self.synthesizer.sample(self.batch_size)

        return self.data_prep.inverse_prep(sample).astype(
            dtype=self.raw_df.dtypes.to_dict()
        )

Fixes the issue.

zhao-zilong · 2024-11-11T07:57:23Z

OK, cool, thanks @omaralvarez Would you like to create a pull request to improve this part of code?

omaralvarez · 2024-11-12T16:51:17Z

Yep, no problem. Right now I am tight on time, but as soon as I can I will whip out a pull request.

omaralvarez changed the title ~~Generation failing with infinite time for unbalanced datasets~~ Generation failing with infinite time for imbalanced datasets Nov 2, 2024

zhao-zilong added the enhancement New feature or request label Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generation failing with infinite time for imbalanced datasets #22

Generation failing with infinite time for imbalanced datasets #22

omaralvarez commented Nov 2, 2024 •

edited

Loading

omaralvarez commented Nov 7, 2024

zhao-zilong commented Nov 8, 2024

omaralvarez commented Nov 8, 2024

zhao-zilong commented Nov 11, 2024

omaralvarez commented Nov 12, 2024

Generation failing with infinite time for imbalanced datasets #22

Generation failing with infinite time for imbalanced datasets #22

Comments

omaralvarez commented Nov 2, 2024 • edited Loading

omaralvarez commented Nov 7, 2024

zhao-zilong commented Nov 8, 2024

omaralvarez commented Nov 8, 2024

zhao-zilong commented Nov 11, 2024

omaralvarez commented Nov 12, 2024

omaralvarez commented Nov 2, 2024 •

edited

Loading