Memory and SQLDataSource #341

Kastanek · 2023-05-15T09:18:19Z

Ask the question
Is training a model using SQLDataSource suitable for large datasets that do not fit in RAM? I expect my dataset to grow to hundreds of thousands of records. I see that batching is performed, but I'm not sure whether a model can be trained this way. I'm particularly interested in training with XGBoostRegressionTrainer.
Is your question about a specific Tribuo class?
SQLDataSource

Craigacp · 2023-05-15T13:42:27Z

We have trained XGBoost models in Tribuo with hundreds of thousands of records, though we used a fairly large machine to do so. Batch loading from the SQL DB isn't the relevant part, as Tribuo requires all the data be in memory before it can train a model.

Kastanek added the question General question label May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory and SQLDataSource #341

Memory and SQLDataSource #341

Kastanek commented May 15, 2023

Craigacp commented May 15, 2023

Memory and SQLDataSource #341

Memory and SQLDataSource #341

Comments

Kastanek commented May 15, 2023

Craigacp commented May 15, 2023