Any ongoing research for multi-table CTGAN solutions? #268

wilhelmagren · 2023-02-14T00:19:46Z

TL;DR, ideas, thoughts, insights, about multi-table solutions using the CTGAN model? Yay or nay?

Hi,

Let me start of by saying how much I enjoy this repository. You truly managed to make the CTGAN model easily digestible, both in your paper, and in the implemented code.

I am wondering; is there ongoing research for multi-table synthetic data GAN based solutions (e.g. extending the CTGAN to be hierarchical, which Hazy supposedly can make, ref). Or is it not worth exploring it?

If it is not worth exploring multi-table CTGAN, could someone offer me some insight as to why? Does it have to do with difficulties capturing long-term primary-foreign key relations? Maintaining referential integrity? Model complexity? Are Gaussian Copulas just the better alternative for encoding the statistic properties of table relations?

I understand that CTGAN is designed to be conditional on discrete columns during training, for one table. But could one not extend the model to e.g. sample the latent space noise vector $z \sim \mathcal{N}(\mu_r, \sigma_r)$ from a prior distribution based on related table statistics $\mu_r$ and $\sigma_r$ aggregated over all the columns? This way you would, again, condition your prior on information that is relevant to the table being synthesized.

Nevertheless, I think synthetic data is a very interesting area of research, and I'm eager to read anyone's opinions, insights, or comments on the questions which I pose above.

Regards,

wilhelmagren added pending review This issue needs to be further reviewed, so work cannot be started question General question about the software labels Feb 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any ongoing research for multi-table CTGAN solutions? #268

Any ongoing research for multi-table CTGAN solutions? #268

wilhelmagren commented Feb 14, 2023

Any ongoing research for multi-table CTGAN solutions? #268

Any ongoing research for multi-table CTGAN solutions? #268

Comments

wilhelmagren commented Feb 14, 2023