[DOC]: Confused about DFP autoencoder training and inference pipelines running concurrently and getting data from the same source? #1221

nyck33 · 2023-09-24T11:50:50Z

How would you describe the priority of this documentation request

Medium

Describe the future/missing documentation

https://docs.nvidia.com/morpheus/developer_guide/guides/6_digital_fingerprinting_reference.html

There it looks like data comes from a source and is used to train the auto-encoder model and then that updated model is pushed to a model repo like MLFlow and then it's extracted at inference time when the inference pipeline is ready with its data to make an inference. However, I can't tell if the same piece of data that is used for training is also being used for inference which to me sounds like a bad idea? Over-fitting?

I watched this good video and did the mini course: https://www.nvidia.com/en-us/on-demand/session/gtcspring23-se52161/

I agree that it would be nice to see an ecosystem burgeon with like a model zoo but at the same time learned the difficulties in cybersecurity where models are only good in the environments they were trained in.

So I'm wondering if you can shed some light on the above question on how to prevent overfitting.

Where have you looked?

https://docs.nvidia.com/morpheus/developer_guide/guides/6_digital_fingerprinting_reference.html

https://www.nvidia.com/en-us/on-demand/session/gtcspring23-se52161/

Code of Conduct

I agree to follow this project's Code of Conduct
I have searched the open documentation issues and have found no duplicates for this bug report

jarmak-nv · 2023-09-24T11:51:02Z

Hi @nyck33!

Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can!
In the mean time, feel free to add any relevant information to this issue.

nyck33 · 2023-09-24T14:22:30Z

Please feel free to move to discussions as I researched on ChatGPT and decided to not update on every sample but to have a cache holding a sliding window's worth of the last x samples then when I know context drift has occurred, I can batch those in the cache, retrain the autoencoder to fit the new context. This way it wont' overfit and I wait until the context drift is known to have occured, judged by human, before retraining.

nyck33 · 2023-09-25T00:20:52Z

Also there is mention of a 2 models, the autoencoder and a time series model but I don't recall learning anything about the time series model at all in that DLI 1 hour course for the DFP example. Please feel free to clarify on this at your next convenience.

nyck33 added the doc Improvements or additions to documentation label Sep 24, 2023

github-project-automation bot added this to Morpheus Boards Sep 24, 2023

github-project-automation bot moved this to Todo in Morpheus Boards Sep 24, 2023

jarmak-nv added Needs Triage Need team to review and classify external This issue was filed by someone outside of the Morpheus team labels Sep 24, 2023

nyck33 changed the title ~~[DOC]: Confused about DFP autoencoder training and inference pipelines running concurrently~~ [DOC]: Confused about DFP autoencoder training and inference pipelines running concurrently and getting data from the same source? Sep 24, 2023

nyck33 closed this as completed Sep 24, 2023

github-project-automation bot moved this from Todo to Done in Morpheus Boards Sep 24, 2023

nyck33 reopened this Sep 25, 2023

github-project-automation bot moved this from Done to In Progress in Morpheus Boards Sep 25, 2023

nyck33 closed this as completed Sep 25, 2023

github-project-automation bot moved this from In Progress to Done in Morpheus Boards Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC]: Confused about DFP autoencoder training and inference pipelines running concurrently and getting data from the same source? #1221

[DOC]: Confused about DFP autoencoder training and inference pipelines running concurrently and getting data from the same source? #1221

nyck33 commented Sep 24, 2023

jarmak-nv commented Sep 24, 2023

nyck33 commented Sep 24, 2023

nyck33 commented Sep 25, 2023

[DOC]: Confused about DFP autoencoder training and inference pipelines running concurrently and getting data from the same source? #1221

[DOC]: Confused about DFP autoencoder training and inference pipelines running concurrently and getting data from the same source? #1221

Comments

nyck33 commented Sep 24, 2023

How would you describe the priority of this documentation request

Describe the future/missing documentation

Where have you looked?

Code of Conduct

jarmak-nv commented Sep 24, 2023

nyck33 commented Sep 24, 2023

nyck33 commented Sep 25, 2023