-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generation failing with infinite time for imbalanced datasets #22
Comments
I think I found the problem, it has to do with negative numeric target labels. Using strings it works. The same thing is happening in your other model CTABGAN. |
Hi @omaralvarez thanks for your comment. What do you mean by negative numeric target labels? like "-1" and "-5.78" in the target label column? |
I finally found the issue, sometimes the original pandas datatypes are not given back when the model generates samples. It returns objects (I think strings). So that was causing a bug in my code. A simple: sample = self.synthesizer.sample(self.batch_size)
return self.data_prep.inverse_prep(sample).astype(
dtype=self.raw_df.dtypes.to_dict()
) Fixes the issue. |
OK, cool, thanks @omaralvarez Would you like to create a pull request to improve this part of code? |
Yep, no problem. Right now I am tight on time, but as soon as I can I will whip out a pull request. |
First of all, thanks for your contribution. I am using your model for dataset rebalancing, and with datasets with low and imbalanced samples I am facing a problem. Generation fails due to reaching the maximum amount of tries, I have tried several approaches like increasing the epochs or trying to use #7, to no avail, generation never completes. One of the datasets in which is happening is:
https://imbalanced-learn.org/stable/datasets/index.html
The text was updated successfully, but these errors were encountered: