Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC]: Confused about DFP autoencoder training and inference pipelines running concurrently and getting data from the same source? #1221

Closed
2 tasks done
nyck33 opened this issue Sep 24, 2023 · 3 comments
Labels
doc Improvements or additions to documentation external This issue was filed by someone outside of the Morpheus team Needs Triage Need team to review and classify

Comments

@nyck33
Copy link

nyck33 commented Sep 24, 2023

How would you describe the priority of this documentation request

Medium

Describe the future/missing documentation

https://docs.nvidia.com/morpheus/developer_guide/guides/6_digital_fingerprinting_reference.html

There it looks like data comes from a source and is used to train the auto-encoder model and then that updated model is pushed to a model repo like MLFlow and then it's extracted at inference time when the inference pipeline is ready with its data to make an inference. However, I can't tell if the same piece of data that is used for training is also being used for inference which to me sounds like a bad idea? Over-fitting?

I watched this good video and did the mini course: https://www.nvidia.com/en-us/on-demand/session/gtcspring23-se52161/

I agree that it would be nice to see an ecosystem burgeon with like a model zoo but at the same time learned the difficulties in cybersecurity where models are only good in the environments they were trained in.

So I'm wondering if you can shed some light on the above question on how to prevent overfitting.

Where have you looked?

https://docs.nvidia.com/morpheus/developer_guide/guides/6_digital_fingerprinting_reference.html

https://www.nvidia.com/en-us/on-demand/session/gtcspring23-se52161/

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • I have searched the open documentation issues and have found no duplicates for this bug report
@nyck33 nyck33 added the doc Improvements or additions to documentation label Sep 24, 2023
@jarmak-nv jarmak-nv added Needs Triage Need team to review and classify external This issue was filed by someone outside of the Morpheus team labels Sep 24, 2023
@jarmak-nv
Copy link
Contributor

Hi @nyck33!

Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can!
In the mean time, feel free to add any relevant information to this issue.

@nyck33 nyck33 changed the title [DOC]: Confused about DFP autoencoder training and inference pipelines running concurrently [DOC]: Confused about DFP autoencoder training and inference pipelines running concurrently and getting data from the same source? Sep 24, 2023
@nyck33
Copy link
Author

nyck33 commented Sep 24, 2023

Please feel free to move to discussions as I researched on ChatGPT and decided to not update on every sample but to have a cache holding a sliding window's worth of the last x samples then when I know context drift has occurred, I can batch those in the cache, retrain the autoencoder to fit the new context. This way it wont' overfit and I wait until the context drift is known to have occured, judged by human, before retraining.

@nyck33 nyck33 closed this as completed Sep 24, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in Morpheus Boards Sep 24, 2023
@nyck33 nyck33 reopened this Sep 25, 2023
@github-project-automation github-project-automation bot moved this from Done to In Progress in Morpheus Boards Sep 25, 2023
@nyck33
Copy link
Author

nyck33 commented Sep 25, 2023

Also there is mention of a 2 models, the autoencoder and a time series model but I don't recall learning anything about the time series model at all in that DLI 1 hour course for the DFP example. Please feel free to clarify on this at your next convenience.

@nyck33 nyck33 closed this as completed Sep 25, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Morpheus Boards Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Improvements or additions to documentation external This issue was filed by someone outside of the Morpheus team Needs Triage Need team to review and classify
Projects
Status: Done
Development

No branches or pull requests

2 participants