Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory and SQLDataSource #341

Open
Kastanek opened this issue May 15, 2023 · 1 comment
Open

Memory and SQLDataSource #341

Kastanek opened this issue May 15, 2023 · 1 comment
Labels
question General question

Comments

@Kastanek
Copy link

Ask the question
Is training a model using SQLDataSource suitable for large datasets that do not fit in RAM? I expect my dataset to grow to hundreds of thousands of records. I see that batching is performed, but I'm not sure whether a model can be trained this way. I'm particularly interested in training with XGBoostRegressionTrainer.
Is your question about a specific Tribuo class?
SQLDataSource

@Kastanek Kastanek added the question General question label May 15, 2023
@Craigacp
Copy link
Member

We have trained XGBoost models in Tribuo with hundreds of thousands of records, though we used a fairly large machine to do so. Batch loading from the SQL DB isn't the relevant part, as Tribuo requires all the data be in memory before it can train a model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question
Projects
None yet
Development

No branches or pull requests

2 participants