-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC]: DFP Starter Example README.md Needs to be Updated #1713
Comments
@oguzhancelik2425 Thank you for submitting this issue. The Starter DFP implementation and documentation are somewhat stale as we decided to focus on single DFP implementation, Production DFP. The two implementations have diverged significantly so we have been encouraging everyone to now start with Production DFP which comes with a bit more complexity but incorporates new Morpheus features and is more scalable. We plan on having Starter DFP removed for the next release (#1715). |
Thanks for the quick response @efajardo-nv. I wonder about the part that returns the anomaly scores in the hammah_inference.py, it shows index |
@oguzhancelik2425 You are correct. That inference script which is run independent of Morpheus has not been kept in sync with the changes in the autoencoder and data file paths. We'll get that updated. Thanks! |
How would you describe the priority of this documentation request
High
Please provide a link or source to the relevant docs
https://github.com/nv-morpheus/Morpheus/blob/branch-24.06/examples/digital_fingerprinting/starter/README.md
Describe the problems in the documentation
In this file there are 3 given examples for the DFP pipeline which includes cloudtrail, azure aad, and duo message log use cases. However, when I deep dive into the documentation for running examples, the explanation is somehow poor. In the examples of the uses cases above there are some training and validation dataset samples. When I check for the cloudtrail input validation sample dataset, it already has the
ae_anomaly_score
andts_anomaly
values, but those are calculated by the model should not be in the training validation dataset I assume. Moreover, when I check the train ae model stage, I cannot see theae_anomlay_score
andts_anomaly
calculation, but these are in the hammah-inference.py module in the/models/validation-scripts/dfp-models/
directory. From those information there is no easy way to understand what is the order, or running the pipeline and scripts, what are the training, validation and inference files, and how those anomaly scores are being calculated?On the other hand, I realized that some directory paths are not updated yet which causes errors in the CLI. For example: the CLI example here shows that
--columns_file=morpheus/data/columns_ae_cloudtrail.txt
option should read the feature columns txt file but there is no path from/morpheus/
instead it should bemodels/data/columns_ae_cloudtrail.txt
. Similar problem exists in the rest of the README.md file.(Optional) Propose a correction
I think this tarted example needs to be reviewed in detail and should be given more details to understand how this model works, what resources need to be used for the given examples and what is the order to run. In my current example that I tried to explain above, I understand that user should run the hammah-20211017.ipynb notebook first right after feature selection with the training dataset, then trained the model here in order to get the
ae_anomaly_score
andts_anomaly
values via running hammah-inference.py module. Then, the user needs to run the dfp pipeline CLI based on the use case's features with the file which is overridden by the previous hammah-inference.py, in order to get reconstruction_loss, z_loss, scores values etc. If I am wrong it is because of poor documentation of the example, unfortunately.Code of Conduct
The text was updated successfully, but these errors were encountered: