Problem on test set inference on a single machine with multiple GPUs #105

mayii2001 · 2024-10-03T05:53:53Z

Hi, thank for your great work! Your code is based on the lightning torch. When i deployed the model on a single machine with multiple GPUs, it started several GLOBAL processes, which is necessary for training acceleration but raises a problem when testing. I planned to load a test set with a length of 1k for example, while the predictive results appeared to be with a quadruple length (using make_evaluation_predictions() ). I think it is the biggest reason for my very very slow inference which didnt happen on validation set. The document of lightning recommends using trainer(device=1) to test. I tried initializing a new trainer like below but raised a TypeError: model must be a LightningModule or torch._dynamo.OptimizedModule, got LagLlamaLightningModule. I dont know how to fix it now.

model = LagLlamaEstimator()
single_device_trainer = Trainer(devices=1,max_epochs=1)
pre_results=single_device_trainer.test(model=model.network, dataloaders=test_loader)

The text was updated successfully, but these errors were encountered:

ashok-arjun · 2024-10-03T19:43:22Z

Hi, I'm not sure about lightning. Predictions are very slow with our model and even slower if you increase the pred length.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem on test set inference on a single machine with multiple GPUs #105

Problem on test set inference on a single machine with multiple GPUs #105

mayii2001 commented Oct 3, 2024 •

edited

Loading

ashok-arjun commented Oct 3, 2024

Problem on test set inference on a single machine with multiple GPUs #105

Problem on test set inference on a single machine with multiple GPUs #105

Comments

mayii2001 commented Oct 3, 2024 • edited Loading

ashok-arjun commented Oct 3, 2024

mayii2001 commented Oct 3, 2024 •

edited

Loading