Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem on test set inference on a single machine with multiple GPUs #105

Open
mayii2001 opened this issue Oct 3, 2024 · 1 comment
Open

Comments

@mayii2001
Copy link

mayii2001 commented Oct 3, 2024

Hi, thank for your great work! Your code is based on the lightning torch. When i deployed the model on a single machine with multiple GPUs, it started several GLOBAL processes, which is necessary for training acceleration but raises a problem when testing. I planned to load a test set with a length of 1k for example, while the predictive results appeared to be with a quadruple length (using make_evaluation_predictions() ). I think it is the biggest reason for my very very slow inference which didnt happen on validation set. The document of lightning recommends using trainer(device=1) to test. I tried initializing a new trainer like below but raised a TypeError: model must be a LightningModule or torch._dynamo.OptimizedModule, got LagLlamaLightningModule. I dont know how to fix it now.

model = LagLlamaEstimator()
single_device_trainer = Trainer(devices=1,max_epochs=1)
pre_results=single_device_trainer.test(model=model.network, dataloaders=test_loader)
@ashok-arjun
Copy link
Contributor

Hi, I'm not sure about lightning. Predictions are very slow with our model and even slower if you increase the pred length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants