Multi GPU support #290

saswat0 · 2023-05-07T12:08:13Z

Problem Description

The current implementation doesn't consider servers with multiple GPUs. For scenarios where several cards, each with a lower VRAM are present, running CTGAN throws an out of memory.

The below trace is during a run where a job was triggered on a T4 GPU (common in cloud servers). The real dataset had 26 columns and 20k rows.

OutOfMemoryError: CUDA out of memory. Tried to allocate 3.46 GiB (GPU 0; 14.76 GiB total capacity; 10.49 GiB already allocated; 621.75 MiB free; 13.38 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Expected behavior

CTGAN should be able to leverage PyTorch's DataParallel module such that model and data parallelism can be facilitated for bigger batch sizes.

The text was updated successfully, but these errors were encountered:

saswat0 added feature request Request for a new feature pending review This issue needs to be further reviewed, so work cannot be started labels May 7, 2023

npatki mentioned this issue May 15, 2023

Out of memory while fit sdv-dev/SDV#1381

Closed

npatki mentioned this issue Aug 1, 2023

CTGAN - cuda=TRUE; multiple GPU training sdv-dev/SDV#1500

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi GPU support #290

Multi GPU support #290

saswat0 commented May 7, 2023

Multi GPU support #290

Multi GPU support #290

Comments

saswat0 commented May 7, 2023

Problem Description

Expected behavior