Any ongoing research for multi-table CTGAN solutions? #268
Labels
pending review
This issue needs to be further reviewed, so work cannot be started
question
General question about the software
TL;DR, ideas, thoughts, insights, about multi-table solutions using the CTGAN model? Yay or nay?
Hi,
Let me start of by saying how much I enjoy this repository. You truly managed to make the CTGAN model easily digestible, both in your paper, and in the implemented code.
I am wondering; is there ongoing research for multi-table synthetic data GAN based solutions (e.g. extending the CTGAN to be hierarchical, which Hazy supposedly can make, ref). Or is it not worth exploring it?
If it is not worth exploring multi-table CTGAN, could someone offer me some insight as to why? Does it have to do with difficulties capturing long-term primary-foreign key relations? Maintaining referential integrity? Model complexity? Are Gaussian Copulas just the better alternative for encoding the statistic properties of table relations?
I understand that CTGAN is designed to be conditional on discrete columns during training, for one table. But could one not extend the model to e.g. sample the latent space noise vector$z \sim \mathcal{N}(\mu_r, \sigma_r)$ from a prior distribution based on related table statistics $\mu_r$ and $\sigma_r$ aggregated over all the columns? This way you would, again, condition your prior on information that is relevant to the table being synthesized.
Nevertheless, I think synthetic data is a very interesting area of research, and I'm eager to read anyone's opinions, insights, or comments on the questions which I pose above.
Regards,
The text was updated successfully, but these errors were encountered: