Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]Does HugeCtr support read data for trainning from Kafka ? #413

Closed
sparkling9809 opened this issue Aug 10, 2023 · 3 comments
Closed
Assignees
Labels
question Further information is requested

Comments

@sparkling9809
Copy link

I want to read data from kafka to implement realtim trainning. But the dataReader in Hugectr just supports file now. is there any way to support read data for trainning from Kafka? Thanks.

@sparkling9809 sparkling9809 added the question Further information is requested label Aug 10, 2023
@yingcanw
Copy link
Collaborator

Thanks for your question. Currently, HugeCTR supports reading Parquet data, loading and saving models from/to remote file systems like HDFS, AWS S3, and GCS. And we only support Kafka in inference to support online update of incremental models to HPS. @jershi425 Please add your comments.

@jershi425
Copy link
Collaborator

Yes as @yingcanw said, currently we don't support reading/streaming data from Kafka. Kafka is only for model updating purposes. And it is recommended to use our data reader to read parquet data for training due to its better performance and convenience.

@sparkling9809
Copy link
Author

OK, thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants