[DOC]: Confused about DFP autoencoder training and inference pipelines running concurrently and getting data from the same source? #1221
Labels
doc
Improvements or additions to documentation
external
This issue was filed by someone outside of the Morpheus team
Needs Triage
Need team to review and classify
How would you describe the priority of this documentation request
Medium
Describe the future/missing documentation
https://docs.nvidia.com/morpheus/developer_guide/guides/6_digital_fingerprinting_reference.html
There it looks like data comes from a source and is used to train the auto-encoder model and then that updated model is pushed to a model repo like MLFlow and then it's extracted at inference time when the inference pipeline is ready with its data to make an inference. However, I can't tell if the same piece of data that is used for training is also being used for inference which to me sounds like a bad idea? Over-fitting?
I watched this good video and did the mini course: https://www.nvidia.com/en-us/on-demand/session/gtcspring23-se52161/
I agree that it would be nice to see an ecosystem burgeon with like a model zoo but at the same time learned the difficulties in cybersecurity where models are only good in the environments they were trained in.
So I'm wondering if you can shed some light on the above question on how to prevent overfitting.
Where have you looked?
https://docs.nvidia.com/morpheus/developer_guide/guides/6_digital_fingerprinting_reference.html
https://www.nvidia.com/en-us/on-demand/session/gtcspring23-se52161/
Code of Conduct
The text was updated successfully, but these errors were encountered: